Re: Query interruption with MySQL 5.7 using KILL QUERY in signal handler does no more work

2016-07-12 Thread Sebastien FLAESCH

I did the following test:

My program still uses MySQL 5.7 libmysqlclient.so, but I connect now to a 
5.6.16 server.

And the SQL interrupt works fine... so I suspect there is a MySQL server issue 
in 5.7.

Seb

On 07/12/2016 01:01 PM, Sebastien FLAESCH wrote:

About:

 > B) For some reason, the program does not want to exit() - (must investigate)

In fact we use prepared statements with a sequence of mysql_stmt_init,
mysql_stmt_prepare, mysql_stmt_execute, mysql_stmt_fetch, ... and
mysql_stmt_close.

After the statement was interrupted, we try to free the MySQL statement hanlde 
with:

mysql_stmt_close();

But this API call hangs... (below the strace output)

This was working fine in older versions...

Seb

strace log:

...

sendto(3, "s\0\0\0\26select COUNT(*) from   t2 a"..., 119, 0, NULL, 0) = 119
recvfrom(3, "\f\0\0\1\0\1\0\0\0\1\0\0\0\0\0\0\36\0\0\2\3def\0\0\0\10COUN"..., 
16384, 0, NULL, NULL) = 50
sendto(3, "\n\0\0\0\27\1\0\0\0\1\1\0\0\0", 14, 0, NULL, 0) = 14
recvfrom(3, 0x62905220, 16384, 0, 0, 0) = ? ERESTARTSYS (To be restarted if 
SA_RESTART is set)

...

When mysql_stmt_close() is called, hangs in recvfrom():

recvfrom(3,





On 07/03/2016 06:55 PM, Sebastien FLAESCH wrote:

Hi all,

I use the following technique to cancel a long running query:

In the SIGINT signal handler, I restart a connection and I perform a

KILL QUERY mysql-process-id-of-running-query

This was working find with MySQL 5.6.

But with 5.7 (5.7.11), we get now a different result:

A) The query is still interrupted, but we no longer get an SQL error -1317.

B) For some reason, the program does not want to exit() - (must investigate)

Any clues?


With mysql it's working fine:

mysql> select sleep(10);
^C^C -- query aborted
+---+
| sleep(10) |
+---+
+---+
1 row in set (2.79 sec)

mysql> \q
Bye


Thanks!
Seb







--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query interruption with MySQL 5.7 using KILL QUERY in signal handler does no more work

2016-07-12 Thread Sebastien FLAESCH

About:

> B) For some reason, the program does not want to exit() - (must investigate)

In fact we use prepared statements with a sequence of mysql_stmt_init,
mysql_stmt_prepare, mysql_stmt_execute, mysql_stmt_fetch, ... and
mysql_stmt_close.

After the statement was interrupted, we try to free the MySQL statement hanlde 
with:

mysql_stmt_close();

But this API call hangs... (below the strace output)

This was working fine in older versions...

Seb

strace log:

...

sendto(3, "s\0\0\0\26select COUNT(*) from   t2 a"..., 119, 0, NULL, 0) = 119
recvfrom(3, "\f\0\0\1\0\1\0\0\0\1\0\0\0\0\0\0\36\0\0\2\3def\0\0\0\10COUN"..., 
16384, 0, NULL, NULL) = 50
sendto(3, "\n\0\0\0\27\1\0\0\0\1\1\0\0\0", 14, 0, NULL, 0) = 14
recvfrom(3, 0x62905220, 16384, 0, 0, 0) = ? ERESTARTSYS (To be restarted if 
SA_RESTART is set)

...

When mysql_stmt_close() is called, hangs in recvfrom():

recvfrom(3,





On 07/03/2016 06:55 PM, Sebastien FLAESCH wrote:

Hi all,

I use the following technique to cancel a long running query:

In the SIGINT signal handler, I restart a connection and I perform a

KILL QUERY mysql-process-id-of-running-query

This was working find with MySQL 5.6.

But with 5.7 (5.7.11), we get now a different result:

A) The query is still interrupted, but we no longer get an SQL error -1317.

B) For some reason, the program does not want to exit() - (must investigate)

Any clues?


With mysql it's working fine:

mysql> select sleep(10);
^C^C -- query aborted
+---+
| sleep(10) |
+---+
+---+
1 row in set (2.79 sec)

mysql> \q
Bye


Thanks!
Seb




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query Summary Help...

2015-10-24 Thread Mogens Melander

You need to GROUP BY those fields NOT in the aggregate function. Like:

SELECT f.id,f.name,sum(p.price)
FROM fruit f
left join purchase p on f.id = p.fruit
where p.price is not null
group by f.id,f.name;

1, 'Apples', 2
2, 'Grapes', 6.5
4, 'Kiwis', 4

On 2015-10-23 04:15, Don Wieland wrote:

Hi gang,

I have a query:

SELECT
p.pk_ProductID,
p.Description,
i.Quantity

FROM invoice_invoicelines_Product p
JOIN invoice_InvoiceLines i ON p.pk_ProductID = i.fk_ProductID AND
i.fk_InvoiceID IN (1,2,3)

WHERE p.pk_ProductID IN (1,2,3);

It produces a list like the following:

1,Banana,3
2,Orange,1
2,Orange,4
3,Melon,3
3,Melon,3

I want to SUM the i.Quantity per ProductID, but I am unable to get the
scope/syntax correct. I was expecting the following would work:

SELECT
p.pk_ProductID,
p.Description,
SUM(i.Quantity)

FROM invoice_invoicelines_Product p
JOIN invoice_InvoiceLines i ON p.pk_ProductID = i.fk_ProductID AND
i.fk_InvoiceID IN (1,2,3)

WHERE p.pk_ProductID IN (1,2,3)
GROUP BY i.fk_ProductID;

but it is not working.


Little help please. Thanks!


Don Wieland
d...@pointmade.net
http://www.pointmade.net
https://www.facebook.com/pointmade.band


--
Mogens
+66 8701 33224


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query Help...

2015-10-22 Thread shawn l.green



On 10/22/2015 11:48 AM, Don Wieland wrote:



On Oct 20, 2015, at 1:24 PM, shawn l.green  wrote:

Which release of MySQL are you using?


Version 5.5.45-cll


How many rows do you get if you remove the GROUP_CONCAT operator? We don't need 
to see the results. (sometimes it is a good idea to look at the raw, 
unprocessed results)

Is it possible that you are attempting to concat more values than allowed by 
--group-concat-max-len ?


When I did this I realized I was missing a GROUP BY clause

Her is the debugged working version. Thanks guys.

SELECT
ht.*,
CONCAT(o.first_name, " ", o.last_name) AS orphan,
GROUP_CONCAT(DISTINCT hti.rec_code ORDER BY hti.rec_code ASC SEPARATOR ", ") AS 
alloc
FROM hiv_transactions ht
LEFT JOIN tk_orphans o ON ht.orphan_id = o.orphan_id
LEFT JOIN hiv_trans_items hti ON ht.transaction_id = hti.hiv_transaction_id
WHERE ht.donor_id = 730 AND ht.tr_date BETWEEN "2015-01-01 00:00:00" AND "2015-12-31 
23:59:59"
GROUP BY ht.`transaction_id`
ORDER BY ht.tr_date DESC, ht.rec_code ASC;

Don Wieland
d...@pointmade.net
http://www.pointmade.net
https://www.facebook.com/pointmade.band




Thank you for sharing your solution.

Best wishes,
--
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Integrated Cloud Applications & Platform Services
Office: Blountville, TN

Become certified in MySQL! Visit https://www.mysql.com/certification/ 
for details.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query Summary Help...

2015-10-22 Thread Michael Dykman
I'm not at a terminal but have you tried grouping by p.pk_ProductID instead
of i.fk...? It is the actual value you are selecting as well as being on
the primary table in the query.

On Thu, Oct 22, 2015, 5:18 PM Don Wieland  wrote:

> Hi gang,
>
> I have a query:
>
> SELECT
> p.pk_ProductID,
> p.Description,
> i.Quantity
>
> FROM invoice_invoicelines_Product p
> JOIN invoice_InvoiceLines i ON p.pk_ProductID = i.fk_ProductID AND
> i.fk_InvoiceID IN (1,2,3)
>
> WHERE p.pk_ProductID IN (1,2,3);
>
> It produces a list like the following:
>
> 1,Banana,3
> 2,Orange,1
> 2,Orange,4
> 3,Melon,3
> 3,Melon,3
>
> I want to SUM the i.Quantity per ProductID, but I am unable to get the
> scope/syntax correct. I was expecting the following would work:
>
> SELECT
> p.pk_ProductID,
> p.Description,
> SUM(i.Quantity)
>
> FROM invoice_invoicelines_Product p
> JOIN invoice_InvoiceLines i ON p.pk_ProductID = i.fk_ProductID AND
> i.fk_InvoiceID IN (1,2,3)
>
> WHERE p.pk_ProductID IN (1,2,3)
> GROUP BY i.fk_ProductID;
>
> but it is not working.
>
>
> Little help please. Thanks!
>
>
> Don Wieland
> d...@pointmade.net
> http://www.pointmade.net
> https://www.facebook.com/pointmade.band
>
>
>
>
>


Re: Query Summary Help...

2015-10-22 Thread Don Wieland

> On Oct 22, 2015, at 2:41 PM, Michael Dykman  wrote:
> 
> I'm not at a terminal but have you tried grouping by p.pk_ProductID instead
> of i.fk...? It is the actual value you are selecting as well as being on
> the primary table in the query.

Yeah I tried that - actually the SUM I need is on the JOIN relationship - 
results should be:

1,Banana,3
2,Orange,5
3,Melon,6

Thanks!

Don Wieland
d...@pointmade.net
http://www.pointmade.net
https://www.facebook.com/pointmade.band






Re: Query Summary Help...

2015-10-22 Thread Michael Dykman
One more guess:

Try explicitly aliasing the fields of interest and using those aliases
exclusively throughout the rest of the expression.

SELECT
p.pk_ProductID as pid,
p.Description as dsc,
SUM(i.Quantity) as totl

FROM invoice_invoicelines_Product p
JOIN invoice_InvoiceLines i ON pid = i.fk_ProductID

WHERE pid IN (1,2,3)
AND i.fk_InvoiceID IN (1,2,3)
GROUP BY pid;

Note that I moved the invoiceID clause out of the join condition into the
where filter. The ON clause should only contain expressions of relational
interest.

On Thu, Oct 22, 2015, 6:00 PM Don Wieland  wrote:

>
> > On Oct 22, 2015, at 2:41 PM, Michael Dykman  wrote:
> >
> > I'm not at a terminal but have you tried grouping by p.pk_ProductID
> instead
> > of i.fk...? It is the actual value you are selecting as well as being on
> > the primary table in the query.
>
> Yeah I tried that - actually the SUM I need is on the JOIN relationship -
> results should be:
>
> 1,Banana,3
> 2,Orange,5
> 3,Melon,6
>
> Thanks!
>
> Don Wieland
> d...@pointmade.net
> http://www.pointmade.net
> https://www.facebook.com/pointmade.band
>
>
>
>
>


Re: Query Help...

2015-10-22 Thread Don Wieland

> On Oct 20, 2015, at 1:24 PM, shawn l.green  wrote:
> 
> Which release of MySQL are you using?

Version 5.5.45-cll

> How many rows do you get if you remove the GROUP_CONCAT operator? We don't 
> need to see the results. (sometimes it is a good idea to look at the raw, 
> unprocessed results)
> 
> Is it possible that you are attempting to concat more values than allowed by 
> --group-concat-max-len ?

When I did this I realized I was missing a GROUP BY clause

Her is the debugged working version. Thanks guys. 

SELECT 
ht.*, 
CONCAT(o.first_name, " ", o.last_name) AS orphan, 
GROUP_CONCAT(DISTINCT hti.rec_code ORDER BY hti.rec_code ASC SEPARATOR ", ") AS 
alloc 
FROM hiv_transactions ht 
LEFT JOIN tk_orphans o ON ht.orphan_id = o.orphan_id 
LEFT JOIN hiv_trans_items hti ON ht.transaction_id = hti.hiv_transaction_id 
WHERE ht.donor_id = 730 AND ht.tr_date BETWEEN "2015-01-01 00:00:00" AND 
"2015-12-31 23:59:59" 
GROUP BY ht.`transaction_id`
ORDER BY ht.tr_date DESC, ht.rec_code ASC;

Don Wieland
d...@pointmade.net
http://www.pointmade.net
https://www.facebook.com/pointmade.band






Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-20 Thread Johan De Meersman
- Original Message -
> From: "Shawn Green" <shawn.l.gr...@oracle.com>
> Subject: Re: Query optimizer-miss with unqualified expressions, bug or 
> feature?
> 
> On a more serious note, indexes with limited cardinality are less useful
> than those with excellent cardinality. Cardinality is an approximation
> (or calculation. It depends on your storage engine) of how many unique
> values there are in the index.

On a related note, are there any plans (and could you offer a rough timeframe?) 
to include bitmap indices in MySQL?


Thanks,
Johan

-- 
Unhappiness is discouraged and will be corrected with kitten pictures.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query Help...

2015-10-20 Thread Peter Brawley

On 2015-10-20 12:54 PM, Don Wieland wrote:

Hi all,

Trying to get a query working:

SELECT
ht.*,
CONCAT(o.first_name, " ", o.last_name) AS orphan,
GROUP_CONCAT(DISTINCT hti.rec_code ORDER BY hti.rec_code ASC SEPARATOR ", ") AS 
alloc

FROM hiv_transactions ht

LEFT JOIN tk_orphans o ON ht.orphan_id = o.orphan_id
LEFT JOIN hiv_trans_items hti ON ht.transaction_id = hti.hiv_transaction_id

WHERE ht.donor_id = 730 AND ht.tr _date BETWEEN "2014-01-01 00:00:00" AND 
"2014-12-31 23:59:59"
ORDER BY ht.tr _date DESC, ht.rec_code ASC;



I am only showing one row of the “hiv_transactions” table when there are 
multiple rows.

On the GROUP_CONCAT I am trying to get a comma delineated list of the child 
rec_code with no duplicates

Appreciate any help. Hopefully a small mod ;-)


Group_Concat() is an aggregating function, so you need to Group By the 
column(s) on which you wish to aggregate, and for valid results you need 
to limit Selected columns to those on which you're aggregating plus 
those columns that have unique values for your aggregating columns..


PB





Don Wieland
D W   D a t a



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query Help...

2015-10-20 Thread shawn l.green



On 10/20/2015 1:54 PM, Don Wieland wrote:

Hi all,

Trying to get a query working:

SELECT
ht.*,
CONCAT(o.first_name, " ", o.last_name) AS orphan,
GROUP_CONCAT(DISTINCT hti.rec_code ORDER BY hti.rec_code ASC SEPARATOR ", ") AS 
alloc

FROM hiv_transactions ht

LEFT JOIN tk_orphans o ON ht.orphan_id = o.orphan_id
LEFT JOIN hiv_trans_items hti ON ht.transaction_id = hti.hiv_transaction_id

WHERE ht.donor_id = 730 AND ht.tr_date BETWEEN "2014-01-01 00:00:00" AND "2014-12-31 
23:59:59"
ORDER BY ht.tr_date DESC, ht.rec_code ASC;



I am only showing one row of the “hiv_transactions” table when there are 
multiple rows.

On the GROUP_CONCAT I am trying to get a comma delineated list of the child 
rec_code with no duplicates

Appreciate any help. Hopefully a small mod ;-)


Don Wieland



Which release of MySQL are you using?

How many rows do you get if you remove the GROUP_CONCAT operator? We 
don't need to see the results. (sometimes it is a good idea to look at 
the raw, unprocessed results)


Is it possible that you are attempting to concat more values than 
allowed by --group-concat-max-len ?


Yours,
--
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Integrated Cloud Applications & Platform Services
Office: Blountville, TN

Become certified in MySQL! Visit https://www.mysql.com/certification/ 
for details.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread Ben Clewett

Hi Roy,

Thanks for the clear explanation.

I guess (hypothetically) the optimizer could see if it has a key, and 
then use two starts: one on 'a > 0' and one on 'a < 0', taking a union 
of the result?  Which might make a significant result to something?


Ben.


On 2015-10-19 14:19, Roy Lyseng wrote:

Hi Ben,

On 19.10.15 15.10, Ben Clewett wrote:


I have noticed that an unqualified boolean expression cannot be 
optimized by

MySQL to use an index in 5.6.24.

For example:

CREATE TABLE t (
   i INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
   a BOOLEAN NOT NULL,
   KEY a (a)
) ENGINE=InnoDB;

This will hit key 'a':

SELECT * FROM t WHERE a = TRUE;

This will return the same results, but not use key 'a':

SELECT * FROM t WHERE a;

Is this a bug, or deliberate behaviour, or a missing feature, or perhaps
something else?


MySQL does not have a true boolean type, so this is actually 
interpreted as


  SELECT * FROM t WHERE a <> 0;

The optimizer is not able to see that "a <> 0" means "a = 1", and 
hence no index will be used.


Thanks,
Roy



Thanks,

Ben Clewett.








--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread Roy Lyseng

Hi Ben,

On 19.10.15 15.10, Ben Clewett wrote:


I have noticed that an unqualified boolean expression cannot be optimized by
MySQL to use an index in 5.6.24.

For example:

CREATE TABLE t (
   i INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
   a BOOLEAN NOT NULL,
   KEY a (a)
) ENGINE=InnoDB;

This will hit key 'a':

SELECT * FROM t WHERE a = TRUE;

This will return the same results, but not use key 'a':

SELECT * FROM t WHERE a;

Is this a bug, or deliberate behaviour, or a missing feature, or perhaps
something else?


MySQL does not have a true boolean type, so this is actually interpreted as

  SELECT * FROM t WHERE a <> 0;

The optimizer is not able to see that "a <> 0" means "a = 1", and hence no index 
will be used.


Thanks,
Roy



Thanks,

Ben Clewett.







--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread Roy Lyseng

Hi Shawn,

On 19.10.15 22.33, shawn l.green wrote:



On 10/19/2015 3:48 PM, Roy Lyseng wrote:

Hi Ben,

On 19.10.15 16.07, Ben Clewett wrote:

Hi Roy,

Thanks for the clear explanation.

I guess (hypothetically) the optimizer could see if it has a key, and
then use
two starts: one on 'a > 0' and one on 'a < 0', taking a union of the
result?
Which might make a significant result to something?


That is correct. However, if the substitution type for BOOLEAN was
UNSIGNED TINYINT instead of TINYINT, the range 'a < 0' would not even be
needed, and the quite cumbersome UNION would be avoided. But the best
solution would of course be a two-valued boolean type, where 'a <> 0'
would easily be transformed to a = 1. It would also mean that statistics
for the columns would be better, with TINYINT each value has the
estimated probability 1/256, whereas a boolean value would have
probability 1/2.




256 possible values of BOOLEAN? I've heard of fuzzy logic but this is awesome!
Some new literal value names to consider: maybe, sort_of, nearly_always,
certainly, practically_never, likely, ...
*


Well, it is the practical consequence of using TINYINT as the substitution type 
for BOOLEAN...





On a more serious note, indexes with limited cardinality are less useful than
those with excellent cardinality. Cardinality is an approximation (or
calculation. It depends on your storage engine) of how many unique values there
are in the index.

If the Optimizer estimates (based on a calculation based on the Cardinality)
that more than about 30% of a table would need to be retrieved in random order
based on an index, then that index is disallowed.  Why? Because the physical
disk overhead of doing random access averages just slightly more than 3x the
overhead used to scan a much larger block of data.

http://dev.mysql.com/doc/refman/5.6/en/how-to-avoid-table-scan.html

An index on just a Boolean value would have at best a cardinality of 2. So, any
indexes on Boolean values should include other columns to help the index become
more selective.

http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html


You are right about the index use, so it would be interesting only with a 
significant skew, say 10% TRUE values. However, the optimizer is not only about 
indexing, but also about calculating the filtering effect of a predicate. Using 
a true BOOLEAN rather than a TINYINT would give a better estimate of the 
filtering effect, and thus of the estimated number of rows as the outcome of a 
query.





*Actually, fuzzy logic has lots of practical application in real world
situations. They are just not using the MySQL BOOLEAN data type to store the
value for comparison.



Thanks,
Roy

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread shawn l.green



On 10/19/2015 3:48 PM, Roy Lyseng wrote:

Hi Ben,

On 19.10.15 16.07, Ben Clewett wrote:

Hi Roy,

Thanks for the clear explanation.

I guess (hypothetically) the optimizer could see if it has a key, and
then use
two starts: one on 'a > 0' and one on 'a < 0', taking a union of the
result?
Which might make a significant result to something?


That is correct. However, if the substitution type for BOOLEAN was
UNSIGNED TINYINT instead of TINYINT, the range 'a < 0' would not even be
needed, and the quite cumbersome UNION would be avoided. But the best
solution would of course be a two-valued boolean type, where 'a <> 0'
would easily be transformed to a = 1. It would also mean that statistics
for the columns would be better, with TINYINT each value has the
estimated probability 1/256, whereas a boolean value would have
probability 1/2.




256 possible values of BOOLEAN? I've heard of fuzzy logic but this is 
awesome! Some new literal value names to consider: maybe, sort_of, 
nearly_always, certainly, practically_never, likely, ...

*


On a more serious note, indexes with limited cardinality are less useful 
than those with excellent cardinality. Cardinality is an approximation 
(or calculation. It depends on your storage engine) of how many unique 
values there are in the index.


If the Optimizer estimates (based on a calculation based on the 
Cardinality) that more than about 30% of a table would need to be 
retrieved in random order based on an index, then that index is 
disallowed.  Why? Because the physical disk overhead of doing random 
access averages just slightly more than 3x the overhead used to scan a 
much larger block of data.


http://dev.mysql.com/doc/refman/5.6/en/how-to-avoid-table-scan.html

An index on just a Boolean value would have at best a cardinality of 2. 
So, any indexes on Boolean values should include other columns to help 
the index become more selective.


http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html


*Actually, fuzzy logic has lots of practical application in real world 
situations. They are just not using the MySQL BOOLEAN data type to store 
the value for comparison.


--
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Integrated Cloud Applications & Platform Services
Office: Blountville, TN

Become certified in MySQL! Visit https://www.mysql.com/certification/ 
for details.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread Roy Lyseng

Hi Ben,

On 19.10.15 16.07, Ben Clewett wrote:

Hi Roy,

Thanks for the clear explanation.

I guess (hypothetically) the optimizer could see if it has a key, and then use
two starts: one on 'a > 0' and one on 'a < 0', taking a union of the result?
Which might make a significant result to something?


That is correct. However, if the substitution type for BOOLEAN was UNSIGNED 
TINYINT instead of TINYINT, the range 'a < 0' would not even be needed, and the 
quite cumbersome UNION would be avoided. But the best solution would of course 
be a two-valued boolean type, where 'a <> 0' would easily be transformed to a = 
1. It would also mean that statistics for the columns would be better, with 
TINYINT each value has the estimated probability 1/256, whereas a boolean value 
would have probability 1/2.


Thanks,
Roy



Ben.


On 2015-10-19 14:19, Roy Lyseng wrote:

Hi Ben,

On 19.10.15 15.10, Ben Clewett wrote:


I have noticed that an unqualified boolean expression cannot be optimized by
MySQL to use an index in 5.6.24.

For example:

CREATE TABLE t (
   i INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
   a BOOLEAN NOT NULL,
   KEY a (a)
) ENGINE=InnoDB;

This will hit key 'a':

SELECT * FROM t WHERE a = TRUE;

This will return the same results, but not use key 'a':

SELECT * FROM t WHERE a;

Is this a bug, or deliberate behaviour, or a missing feature, or perhaps
something else?


MySQL does not have a true boolean type, so this is actually interpreted as

  SELECT * FROM t WHERE a <> 0;

The optimizer is not able to see that "a <> 0" means "a = 1", and hence no
index will be used.

Thanks,
Roy



Thanks,

Ben Clewett.









--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query with variable number of columns?

2014-10-08 Thread Peter Brawley

On 2014-10-08 1:38 PM, Jan Steinman wrote:

I often use CASE WHEN ... to pivot tables. For example, splitting sales data by 
year:

   SELECT
 s_product.name AS `Product`,
 SUM(CASE WHEN YEAR(sales.Date)='2007' THEN sales.Total ELSE NULL END) AS 
`2007`,
 SUM(CASE WHEN YEAR(sales.Date)='2008' THEN sales.Total ELSE NULL END) AS 
`2008`,
 SUM(CASE WHEN YEAR(sales.Date)='2009' THEN sales.Total ELSE NULL END) AS 
`2009`,
 ...
   WHERE dynamic predicate that only has results in one year

However, this pattern will often result in numerous empty columns -- empties 
that would not be there had the table not been pivoted.

What techniques do *you* use for avoiding this anti-pattern?


Non-procedural SQL is an incomplete computer language; it can't do that. 
MySQL stored procedures are less incomplete, and can do it, but they're 
awkward. I use the app language (eg PHP) to implement such logic.


PB

-


Am I limited to using a separate programming language (PHP, in this case) with 
a separate COUNT(*) query for each possible column, then CASEing the generation 
of the column SQL? Seems awfully ugly!

Thanks in advance for any insight offered!

(And the following came up at random... perhaps I'll just live with a bunch of 
empty columns...)

 In attempting to fix any system, we may damage another that is working 
perfectly well. -- David Holmgren
 Jan Steinman, EcoReality Co-op 





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query with variable number of columns?

2014-10-08 Thread hsv
 2014/10/08 11:38 -0700, Jan Steinman 
However, this pattern will often result in numerous empty columns -- empties 
that would not be there had the table not been pivoted. 


 2014/10/08 16:42 -0500, Peter Brawley 
MySQL stored procedures are less incomplete, and can do it, but they're 
awkward. 

From a webpage-link on this very list posted, I learnt of a means of (yes, 
clumsily) using SQL procedure to build PREPAREd statements that pivot. It 
entails twice reckoning, once to find good fields, once to pivot and show them.

One selects from a virtual table:
(SELECT DISTINCT YEAR(sales.Date) AS Y FROM ...) AS g

For each good Y one wants this generated (I use ANSI mode, with more PL1 than 
C):
'SUM(IF(YEAR(sales.Date)=' || Y || ', sales.Total, NULL)) AS ' || Y || ''

The outcome is something like this:
set @yearSal = (SELECT 'SELECT s_product.name AS Product, ' || 
GROUP_CONCAT('SUM(IF(YEAR(sales.Date)=' || Y || ', sales.Total, NULL)) AS ' || 
Y || '') || '
FROM ...'
FROM (SELECT DISTINCT YEAR(sales.Date) AS Y FROM ...) AS g;
PREPARE YearSal FROM @YearSal;
EXECUTE YearSal;

Unhappily, PREPARE takes only user-defined variables, and its prepared 
statement, too, is exposed to the procedure s caller. If the prepared statement 
is SELECT ... INTO ..., only user-defined variables are allowed after INTO. 
One who knows the names can learn something about the procedure s working.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query regarding implementation of parallel-replication

2014-09-10 Thread wagnerbianchi.com
It's good to know. Keep up with good work, cheers!!



--
*Wagner Bianchi, MySQL Database Specialist*
Mobile:  +55.31.8654.9510
E-mail:  m...@wagnerbianchi.com
Twitter: @wagnerbianchijr

2014-09-06 3:01 GMT-03:00 Ajay Garg ajaygargn...@gmail.com:

 Hi Wagner.

 That is what I did as the last resort, and that is only what solved the
 issue.


 Thanks.

 On Fri, Sep 5, 2014 at 1:52 AM, wagnerbianchi.com m...@wagnerbianchi.com
 wrote:
  You can try these steps:
 
  1-) Stop slave and write down the replication coordinates getting that in
  MySQL's error log (*very important step*);
  2-) Issue the `reset slave` command on MySQL Slave;
  3-) Issue the CHANGE MASTER TO considering the replication coordinates
  you've just written down on step 1;
  4-) Give replication a start;
  5-) Check if the issue has gone away.
 
  If you're not comfortable to do that, just share the SHOW SLAVE STATUS
  output with us.
 
  Let us know how's it going, cheers!!
 
 
 
 
  --
  Wagner Bianchi, MySQL Database Specialist
  Mobile:  +55.31.8654.9510
  E-mail:  m...@wagnerbianchi.com
  Twitter: @wagnerbianchijr
 
 
  2014-09-04 7:24 GMT-03:00 Ajay Garg ajaygargn...@gmail.com:
 
  Hi all.
 
  Unfortunately, I have run into the logs, as described at
  http://bugs.mysql.com/bug.php?id=71495
 
  Unfortunately, the issue does not go away, even after reverting back
  to slave-parallel-workers=0 in my.cnf, and restarting the mysql
  instance.
 
 
  Any quick idea, as to how we may get the mysql+replication up and
  running (even with the plain old non-multi-threaded mode)?
 
 
 
 
  On Tue, Sep 2, 2014 at 12:57 PM, Ajay Garg ajaygargn...@gmail.com
 wrote:
   Thanks Akshay for the reply.
  
   On Tue, Sep 2, 2014 at 12:25 PM, Akshay Suryavanshi
   akshay.suryavansh...@gmail.com wrote:
   Hello Ajay,
  
   I tried testing the slave-parallel-workers few months ago, what I can
   surely
   tell you its still under development, and at that time needed some
   critical
   bug fixing.
  
   It is helpful in situations where each schema has even workload. The
   case
   you mentioned above doesnt have so. DB2 is getting different type of
   load
   than the others, in that case the other slave workers should be able
 to
   proceed with their workload as opposed to db2 which is still
 executing
   the
   long running statement. Now just imagine what happens if we try to
 take
   a
   backup, what binlog position should be captured ? the show slave
 status
   will
   print what ? this is where it needs development, I tried testing
   backups on
   it, but there is no concrete documentation on what position it would
   fetch.
  
   db2-statement-1 (very, very long-running)
   db2-statement-2 (short-running)
  
   about the above scenario, the next db2-statement-2 it will wait for
 the
   long
   running statement-1 to complete.
  
   Surely.. !! :)
  
  
   However, my concern is how this tracking is done.
   That is, how is the db-wise segregation of statements done (from a
   single-binlog-file originally coming onto the slave) ?
  
   If this segregation is not done, then I cannot think of a way on how
   things would scale up, like for example, when the slave-relay-log-file
   contains a random mix of statements from tens of different databases.
  
  
  
   Any pointers on the actual current implementation of this db-wise
   statements-segregation will be a great confidence-booster !!  :)
  
  
  
   Thanks and Regards,
   Ajay
  
  
However db2-statement-2 can be picked up by
   any other sql worker thread.
  
   This is a good feature added in mysql, however still needs to go
   through lot
   of testing. Please share your observation and findings in case it
   differs
   from the above.
  
   Cheers!!!
   Akshay
  
  
   On Mon, Sep 1, 2014 at 8:27 AM, Ajay Garg ajaygargn...@gmail.com
   wrote:
  
   Hi all.
  
  
   We have replication set-up, where we cater to HUUGEE amounts of
 data.
   Since quite some time, we have been facing issues wherein the slave
   lags behind master quite a lot.
  
  
   So, yesterday we were able to setup parallel replication, by
   incorporating the following changes ::
  
   a)
   To begin with, partitioned some tables into dedicated databases.
  
   b)
   Set up the slave-parallel-workers parameter.
  
  
   The above seems to work functionally fine, but we have one
 doubt/query
   about the scalability of this solution.
  
  
  
  
   First, I will jot down the flow as far as I understand (please
 correct
   if wrong) ::
  
   
   Even in parallel-replication scenario, the master writes all the
   binlog (combined for all databases) in just one file, which then
 gets
   passed onto the slave as single-file itself. Thereafter, all the
   replication commands (combined for all databases) are written
   sequentially onto one slave-relay file.
  
   Thereafter, as per the documentation, the slave-SQL-Thread acts as
 the
   manager, handing over commands to worker-threads depending upon the
   databases on 

Re: Query on some MySQL-internals

2014-09-07 Thread Ajay Garg
Hi Martin.

Thanks for the reply.


As I had mentioned, we are running both the instances since last 6
years or so, and the records are inserted/deleted on both the
instances.

So, we did a show table status like 'XX' \G; on both the
instances, and following are the outputs (here XX is the table
upon which the OPTIMIZE command was run).

Also note that the outputs are after the OPTIMIZE command had been run
on the respective instance-tables ::


1)
Instance 1, which showed massive improvement in INSERT query
completion times after OPTIMIZE command was run on table XX::

db1show table status like 'XX' \G;
*** 1. row ***
   Name: XX
 Engine: InnoDB
Version: 10
 Row_format: Compact
   Rows: 12380147
 Avg_row_length: 473
Data_length: 5865701376
Max_data_length: 0
   Index_length: 522043392
  Data_free: 91226112
 Auto_increment: NULL
Create_time: NULL
Update_time: NULL
 Check_time: NULL
  Collation: latin1_swedish_ci
   Checksum: NULL
 Create_options: partitioned
Comment:
1 row in set (0.08 sec)


2)
Instance 2, which showed no improvement in INSERT query completion
times, after running OPTIMIZE command on table XX ::


db2show table status like 'XX' \G;
*** 1. row ***
   Name: XX
 Engine: InnoDB
Version: 10
 Row_format: Compact
   Rows: 13189570
 Avg_row_length: 407
Data_length: 5376540672
Max_data_length: 0
   Index_length: 518553600
  Data_free: 36700160
 Auto_increment: NULL
Create_time: NULL
Update_time: NULL
 Check_time: NULL
  Collation: latin1_swedish_ci
   Checksum: NULL
 Create_options: partitioned
Comment:
1 row in set (0.24 sec)



Thanks and Regards,
Ajay

On Sat, Sep 6, 2014 at 8:06 PM, Martin Gainty mgai...@hotmail.com wrote:




 Date: Sat, 6 Sep 2014 14:26:22 +0530
 Subject: Query on some MySQL-internals
 From: ajaygargn...@gmail.com
 To: mysql@lists.mysql.com

 Hi all.


 We are facing a very strange scenario.

 We have two mysql-instances running on the same machine, and they had
 been running functionally fine since about 6 years or so (catering to
 millions of records per day).

 However, since last few days, we were experiencing some elongated
 slowness on both the instances.
 So, we decided to OPTIMIZE TABLE slow_table on both the instances.

 We first ran the command on one instance.
 That speeded up things massively (select count(*) that was earlier
 taking 45 minutes was now running in less than 3 minutes).


 We then ran the command on the second instance. However, that seemed
 to have no effect.
 We ran the command again (on the same instance); again it had no effect.



 What could be the reason of this strange behavior?
 Both the instances run under fairly the same load
 MGHow do you extract the metrics to determine what the second instance is
 handling the same load as first instance?
 MGvmstat?
 MGiostat?
 MGSHOW GLOBAL STATUS ?

 and both instances
 are mounted on the same partition (obviously, all the directories are
 different).


 Hoping for some light on this strange issue.



 Thanks and Regards,
 Ajay

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe: http://lists.mysql.com/mysql




-- 
Regards,
Ajay

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: Query on some MySQL-internals

2014-09-07 Thread Martin Gainty
Good Afternoon Ajay

 

Im not seeing any giant deltas between the two metrics except i did notice the 
elapsed time to run the metric on second instance was 3 times slower

Any chance we can do pathping from your present location:
1)pathping SQLServerInstance1

2)pathping SQLServerInstance2

to determine if there is an intervening router that is slowing down the second 
instance?


Does anyone have advice to get Ajay to track down why his second instance is 
non-preformant ?
Martin 
__ 

   


  


 Date: Sun, 7 Sep 2014 23:06:09 +0530
 Subject: Re: Query on some MySQL-internals
 From: ajaygargn...@gmail.com
 To: mgai...@hotmail.com
 CC: mysql@lists.mysql.com
 
 Hi Martin.
 
 Thanks for the reply.
 
 
 As I had mentioned, we are running both the instances since last 6
 years or so, and the records are inserted/deleted on both the
 instances.
 
 So, we did a show table status like 'XX' \G; on both the
 instances, and following are the outputs (here XX is the table
 upon which the OPTIMIZE command was run).
 
 Also note that the outputs are after the OPTIMIZE command had been run
 on the respective instance-tables ::
 
 
 1)
 Instance 1, which showed massive improvement in INSERT query
 completion times after OPTIMIZE command was run on table XX::
 
 db1show table status like 'XX' \G;
 *** 1. row ***
 Name: XX
 Engine: InnoDB
 Version: 10
 Row_format: Compact
 Rows: 12380147
 Avg_row_length: 473
 Data_length: 5865701376
 Max_data_length: 0
 Index_length: 522043392
 Data_free: 91226112
 Auto_increment: NULL
 Create_time: NULL
 Update_time: NULL
 Check_time: NULL
 Collation: latin1_swedish_ci
 Checksum: NULL
 Create_options: partitioned
 Comment:
 1 row in set (0.08 sec)
 
 
 2)
 Instance 2, which showed no improvement in INSERT query completion
 times, after running OPTIMIZE command on table XX ::
 
 
 db2show table status like 'XX' \G;
 *** 1. row ***
 Name: XX
 Engine: InnoDB
 Version: 10
 Row_format: Compact
 Rows: 13189570
 Avg_row_length: 407
 Data_length: 5376540672
 Max_data_length: 0
 Index_length: 518553600
 Data_free: 36700160
 Auto_increment: NULL
 Create_time: NULL
 Update_time: NULL
 Check_time: NULL
 Collation: latin1_swedish_ci
 Checksum: NULL
 Create_options: partitioned
 Comment:
 1 row in set (0.24 sec)
 
 
 
 Thanks and Regards,
 Ajay
 
 On Sat, Sep 6, 2014 at 8:06 PM, Martin Gainty mgai...@hotmail.com wrote:
 
 
 
 
  Date: Sat, 6 Sep 2014 14:26:22 +0530
  Subject: Query on some MySQL-internals
  From: ajaygargn...@gmail.com
  To: mysql@lists.mysql.com
 
  Hi all.
 
 
  We are facing a very strange scenario.
 
  We have two mysql-instances running on the same machine, and they had
  been running functionally fine since about 6 years or so (catering to
  millions of records per day).
 
  However, since last few days, we were experiencing some elongated
  slowness on both the instances.
  So, we decided to OPTIMIZE TABLE slow_table on both the instances.
 
  We first ran the command on one instance.
  That speeded up things massively (select count(*) that was earlier
  taking 45 minutes was now running in less than 3 minutes).
 
 
  We then ran the command on the second instance. However, that seemed
  to have no effect.
  We ran the command again (on the same instance); again it had no effect.
 
 
 
  What could be the reason of this strange behavior?
  Both the instances run under fairly the same load
  MGHow do you extract the metrics to determine what the second instance is
  handling the same load as first instance?
  MGvmstat?
  MGiostat?
  MGSHOW GLOBAL STATUS ?
 
  and both instances
  are mounted on the same partition (obviously, all the directories are
  different).
 
 
  Hoping for some light on this strange issue.
 
 
 
  Thanks and Regards,
  Ajay
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe: http://lists.mysql.com/mysql
 
 
 
 
 -- 
 Regards,
 Ajay
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe: http://lists.mysql.com/mysql
 
  

Re: Query regarding implementation of parallel-replication

2014-09-06 Thread Ajay Garg
Hi Wagner.

That is what I did as the last resort, and that is only what solved the issue.


Thanks.

On Fri, Sep 5, 2014 at 1:52 AM, wagnerbianchi.com m...@wagnerbianchi.com 
wrote:
 You can try these steps:

 1-) Stop slave and write down the replication coordinates getting that in
 MySQL's error log (*very important step*);
 2-) Issue the `reset slave` command on MySQL Slave;
 3-) Issue the CHANGE MASTER TO considering the replication coordinates
 you've just written down on step 1;
 4-) Give replication a start;
 5-) Check if the issue has gone away.

 If you're not comfortable to do that, just share the SHOW SLAVE STATUS
 output with us.

 Let us know how's it going, cheers!!




 --
 Wagner Bianchi, MySQL Database Specialist
 Mobile:  +55.31.8654.9510
 E-mail:  m...@wagnerbianchi.com
 Twitter: @wagnerbianchijr


 2014-09-04 7:24 GMT-03:00 Ajay Garg ajaygargn...@gmail.com:

 Hi all.

 Unfortunately, I have run into the logs, as described at
 http://bugs.mysql.com/bug.php?id=71495

 Unfortunately, the issue does not go away, even after reverting back
 to slave-parallel-workers=0 in my.cnf, and restarting the mysql
 instance.


 Any quick idea, as to how we may get the mysql+replication up and
 running (even with the plain old non-multi-threaded mode)?




 On Tue, Sep 2, 2014 at 12:57 PM, Ajay Garg ajaygargn...@gmail.com wrote:
  Thanks Akshay for the reply.
 
  On Tue, Sep 2, 2014 at 12:25 PM, Akshay Suryavanshi
  akshay.suryavansh...@gmail.com wrote:
  Hello Ajay,
 
  I tried testing the slave-parallel-workers few months ago, what I can
  surely
  tell you its still under development, and at that time needed some
  critical
  bug fixing.
 
  It is helpful in situations where each schema has even workload. The
  case
  you mentioned above doesnt have so. DB2 is getting different type of
  load
  than the others, in that case the other slave workers should be able to
  proceed with their workload as opposed to db2 which is still executing
  the
  long running statement. Now just imagine what happens if we try to take
  a
  backup, what binlog position should be captured ? the show slave status
  will
  print what ? this is where it needs development, I tried testing
  backups on
  it, but there is no concrete documentation on what position it would
  fetch.
 
  db2-statement-1 (very, very long-running)
  db2-statement-2 (short-running)
 
  about the above scenario, the next db2-statement-2 it will wait for the
  long
  running statement-1 to complete.
 
  Surely.. !! :)
 
 
  However, my concern is how this tracking is done.
  That is, how is the db-wise segregation of statements done (from a
  single-binlog-file originally coming onto the slave) ?
 
  If this segregation is not done, then I cannot think of a way on how
  things would scale up, like for example, when the slave-relay-log-file
  contains a random mix of statements from tens of different databases.
 
 
 
  Any pointers on the actual current implementation of this db-wise
  statements-segregation will be a great confidence-booster !!  :)
 
 
 
  Thanks and Regards,
  Ajay
 
 
   However db2-statement-2 can be picked up by
  any other sql worker thread.
 
  This is a good feature added in mysql, however still needs to go
  through lot
  of testing. Please share your observation and findings in case it
  differs
  from the above.
 
  Cheers!!!
  Akshay
 
 
  On Mon, Sep 1, 2014 at 8:27 AM, Ajay Garg ajaygargn...@gmail.com
  wrote:
 
  Hi all.
 
 
  We have replication set-up, where we cater to HUUGEE amounts of data.
  Since quite some time, we have been facing issues wherein the slave
  lags behind master quite a lot.
 
 
  So, yesterday we were able to setup parallel replication, by
  incorporating the following changes ::
 
  a)
  To begin with, partitioned some tables into dedicated databases.
 
  b)
  Set up the slave-parallel-workers parameter.
 
 
  The above seems to work functionally fine, but we have one doubt/query
  about the scalability of this solution.
 
 
 
 
  First, I will jot down the flow as far as I understand (please correct
  if wrong) ::
 
  
  Even in parallel-replication scenario, the master writes all the
  binlog (combined for all databases) in just one file, which then gets
  passed onto the slave as single-file itself. Thereafter, all the
  replication commands (combined for all databases) are written
  sequentially onto one slave-relay file.
 
  Thereafter, as per the documentation, the slave-SQL-Thread acts as the
  manager, handing over commands to worker-threads depending upon the
  databases on which the commands run.
  
 
 
 
  So far, so good.
  However, what would happen if the slave-relay file contains the
  following
  ::
 
 
  db1-statement-1 (short-running)
  db2-statement-1 (very, very long-running)
  db2-statement-2 (short-running)
  db1-statement-2 (short-running)
  db1-statement-3 (short-running)
 
 
  We will be grateful if someone could please clarifiy, as to how the
  above statements will be 

RE: Query on some MySQL-internals

2014-09-06 Thread Martin Gainty




 Date: Sat, 6 Sep 2014 14:26:22 +0530
 Subject: Query on some MySQL-internals
 From: ajaygargn...@gmail.com
 To: mysql@lists.mysql.com
 
 Hi all.
 
 
 We are facing a very strange scenario.
 
 We have two mysql-instances running on the same machine, and they had
 been running functionally fine since about 6 years or so (catering to
 millions of records per day).
 
 However, since last few days, we were experiencing some elongated
 slowness on both the instances.
 So, we decided to OPTIMIZE TABLE slow_table on both the instances.
 
 We first ran the command on one instance.
 That speeded up things massively (select count(*) that was earlier
 taking 45 minutes was now running in less than 3 minutes).
 
 
 We then ran the command on the second instance. However, that seemed
 to have no effect.
 We ran the command again (on the same instance); again it had no effect.
 
 
 
 What could be the reason of this strange behavior?
 Both the instances run under fairly the same loadMGHow do you extract the 
 metrics to determine what the second instance is handling the same load as 
 first instance?
MGvmstat?MGiostat?
MGSHOW GLOBAL STATUS ?
 and both instances
 are mounted on the same partition (obviously, all the directories are
 different).
 
 
 Hoping for some light on this strange issue.
 
 
 
 Thanks and Regards,
 Ajay
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql
 
  

Re: Query regarding implementation of parallel-replication

2014-09-04 Thread Ajay Garg
Hi all.

Unfortunately, I have run into the logs, as described at
http://bugs.mysql.com/bug.php?id=71495

Unfortunately, the issue does not go away, even after reverting back
to slave-parallel-workers=0 in my.cnf, and restarting the mysql
instance.


Any quick idea, as to how we may get the mysql+replication up and
running (even with the plain old non-multi-threaded mode)?




On Tue, Sep 2, 2014 at 12:57 PM, Ajay Garg ajaygargn...@gmail.com wrote:
 Thanks Akshay for the reply.

 On Tue, Sep 2, 2014 at 12:25 PM, Akshay Suryavanshi
 akshay.suryavansh...@gmail.com wrote:
 Hello Ajay,

 I tried testing the slave-parallel-workers few months ago, what I can surely
 tell you its still under development, and at that time needed some critical
 bug fixing.

 It is helpful in situations where each schema has even workload. The case
 you mentioned above doesnt have so. DB2 is getting different type of load
 than the others, in that case the other slave workers should be able to
 proceed with their workload as opposed to db2 which is still executing the
 long running statement. Now just imagine what happens if we try to take a
 backup, what binlog position should be captured ? the show slave status will
 print what ? this is where it needs development, I tried testing backups on
 it, but there is no concrete documentation on what position it would fetch.

 db2-statement-1 (very, very long-running)
 db2-statement-2 (short-running)

 about the above scenario, the next db2-statement-2 it will wait for the long
 running statement-1 to complete.

 Surely.. !! :)


 However, my concern is how this tracking is done.
 That is, how is the db-wise segregation of statements done (from a
 single-binlog-file originally coming onto the slave) ?

 If this segregation is not done, then I cannot think of a way on how
 things would scale up, like for example, when the slave-relay-log-file
 contains a random mix of statements from tens of different databases.



 Any pointers on the actual current implementation of this db-wise
 statements-segregation will be a great confidence-booster !!  :)



 Thanks and Regards,
 Ajay


  However db2-statement-2 can be picked up by
 any other sql worker thread.

 This is a good feature added in mysql, however still needs to go through lot
 of testing. Please share your observation and findings in case it differs
 from the above.

 Cheers!!!
 Akshay


 On Mon, Sep 1, 2014 at 8:27 AM, Ajay Garg ajaygargn...@gmail.com wrote:

 Hi all.


 We have replication set-up, where we cater to HUUGEE amounts of data.
 Since quite some time, we have been facing issues wherein the slave
 lags behind master quite a lot.


 So, yesterday we were able to setup parallel replication, by
 incorporating the following changes ::

 a)
 To begin with, partitioned some tables into dedicated databases.

 b)
 Set up the slave-parallel-workers parameter.


 The above seems to work functionally fine, but we have one doubt/query
 about the scalability of this solution.




 First, I will jot down the flow as far as I understand (please correct
 if wrong) ::

 
 Even in parallel-replication scenario, the master writes all the
 binlog (combined for all databases) in just one file, which then gets
 passed onto the slave as single-file itself. Thereafter, all the
 replication commands (combined for all databases) are written
 sequentially onto one slave-relay file.

 Thereafter, as per the documentation, the slave-SQL-Thread acts as the
 manager, handing over commands to worker-threads depending upon the
 databases on which the commands run.
 



 So far, so good.
 However, what would happen if the slave-relay file contains the following
 ::


 db1-statement-1 (short-running)
 db2-statement-1 (very, very long-running)
 db2-statement-2 (short-running)
 db1-statement-2 (short-running)
 db1-statement-3 (short-running)


 We will be grateful if someone could please clarifiy, as to how the
 above statements will be managed amongst the Manager and the
 Worker-Threads (let's say there is just one worker-thread-per-db) ?

 In particular, does the Manager thread creates internal
 slave-relay-log-files, one for per database-statements?



 Thanks and Regards,
 Ajay

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql





 --
 Regards,
 Ajay



-- 
Regards,
Ajay

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query regarding implementation of parallel-replication

2014-09-04 Thread wagnerbianchi.com
You can try these steps:

1-) Stop slave and write down the replication coordinates getting that in
MySQL's error log (*very important step*);
2-) Issue the `reset slave` command on MySQL Slave;
3-) Issue the CHANGE MASTER TO considering the replication coordinates
you've just written down on step 1;
4-) Give replication a start;
5-) Check if the issue has gone away.

If you're not comfortable to do that, just share the SHOW SLAVE STATUS
output with us.

Let us know how's it going, cheers!!




--
*Wagner Bianchi, MySQL Database Specialist*
Mobile:  +55.31.8654.9510
E-mail:  m...@wagnerbianchi.com
Twitter: @wagnerbianchijr


2014-09-04 7:24 GMT-03:00 Ajay Garg ajaygargn...@gmail.com:

 Hi all.

 Unfortunately, I have run into the logs, as described at
 http://bugs.mysql.com/bug.php?id=71495

 Unfortunately, the issue does not go away, even after reverting back
 to slave-parallel-workers=0 in my.cnf, and restarting the mysql
 instance.


 Any quick idea, as to how we may get the mysql+replication up and
 running (even with the plain old non-multi-threaded mode)?




 On Tue, Sep 2, 2014 at 12:57 PM, Ajay Garg ajaygargn...@gmail.com wrote:
  Thanks Akshay for the reply.
 
  On Tue, Sep 2, 2014 at 12:25 PM, Akshay Suryavanshi
  akshay.suryavansh...@gmail.com wrote:
  Hello Ajay,
 
  I tried testing the slave-parallel-workers few months ago, what I can
 surely
  tell you its still under development, and at that time needed some
 critical
  bug fixing.
 
  It is helpful in situations where each schema has even workload. The
 case
  you mentioned above doesnt have so. DB2 is getting different type of
 load
  than the others, in that case the other slave workers should be able to
  proceed with their workload as opposed to db2 which is still executing
 the
  long running statement. Now just imagine what happens if we try to take
 a
  backup, what binlog position should be captured ? the show slave status
 will
  print what ? this is where it needs development, I tried testing
 backups on
  it, but there is no concrete documentation on what position it would
 fetch.
 
  db2-statement-1 (very, very long-running)
  db2-statement-2 (short-running)
 
  about the above scenario, the next db2-statement-2 it will wait for the
 long
  running statement-1 to complete.
 
  Surely.. !! :)
 
 
  However, my concern is how this tracking is done.
  That is, how is the db-wise segregation of statements done (from a
  single-binlog-file originally coming onto the slave) ?
 
  If this segregation is not done, then I cannot think of a way on how
  things would scale up, like for example, when the slave-relay-log-file
  contains a random mix of statements from tens of different databases.
 
 
 
  Any pointers on the actual current implementation of this db-wise
  statements-segregation will be a great confidence-booster !!  :)
 
 
 
  Thanks and Regards,
  Ajay
 
 
   However db2-statement-2 can be picked up by
  any other sql worker thread.
 
  This is a good feature added in mysql, however still needs to go
 through lot
  of testing. Please share your observation and findings in case it
 differs
  from the above.
 
  Cheers!!!
  Akshay
 
 
  On Mon, Sep 1, 2014 at 8:27 AM, Ajay Garg ajaygargn...@gmail.com
 wrote:
 
  Hi all.
 
 
  We have replication set-up, where we cater to HUUGEE amounts of data.
  Since quite some time, we have been facing issues wherein the slave
  lags behind master quite a lot.
 
 
  So, yesterday we were able to setup parallel replication, by
  incorporating the following changes ::
 
  a)
  To begin with, partitioned some tables into dedicated databases.
 
  b)
  Set up the slave-parallel-workers parameter.
 
 
  The above seems to work functionally fine, but we have one doubt/query
  about the scalability of this solution.
 
 
 
 
  First, I will jot down the flow as far as I understand (please correct
  if wrong) ::
 
  
  Even in parallel-replication scenario, the master writes all the
  binlog (combined for all databases) in just one file, which then gets
  passed onto the slave as single-file itself. Thereafter, all the
  replication commands (combined for all databases) are written
  sequentially onto one slave-relay file.
 
  Thereafter, as per the documentation, the slave-SQL-Thread acts as the
  manager, handing over commands to worker-threads depending upon the
  databases on which the commands run.
  
 
 
 
  So far, so good.
  However, what would happen if the slave-relay file contains the
 following
  ::
 
 
  db1-statement-1 (short-running)
  db2-statement-1 (very, very long-running)
  db2-statement-2 (short-running)
  db1-statement-2 (short-running)
  db1-statement-3 (short-running)
 
 
  We will be grateful if someone could please clarifiy, as to how the
  above statements will be managed amongst the Manager and the
  Worker-Threads (let's say there is just one worker-thread-per-db) ?
 
  In particular, does the Manager thread creates internal
  slave-relay-log-files, one for per 

Re: Query regarding implementation of parallel-replication

2014-09-02 Thread Ajay Garg
Ping !! :)

On Mon, Sep 1, 2014 at 8:27 AM, Ajay Garg ajaygargn...@gmail.com wrote:
 Hi all.


 We have replication set-up, where we cater to HUUGEE amounts of data.
 Since quite some time, we have been facing issues wherein the slave
 lags behind master quite a lot.


 So, yesterday we were able to setup parallel replication, by
 incorporating the following changes ::

 a)
 To begin with, partitioned some tables into dedicated databases.

 b)
 Set up the slave-parallel-workers parameter.


 The above seems to work functionally fine, but we have one doubt/query
 about the scalability of this solution.




 First, I will jot down the flow as far as I understand (please correct
 if wrong) ::

 
 Even in parallel-replication scenario, the master writes all the
 binlog (combined for all databases) in just one file, which then gets
 passed onto the slave as single-file itself. Thereafter, all the
 replication commands (combined for all databases) are written
 sequentially onto one slave-relay file.

 Thereafter, as per the documentation, the slave-SQL-Thread acts as the
 manager, handing over commands to worker-threads depending upon the
 databases on which the commands run.
 



 So far, so good.
 However, what would happen if the slave-relay file contains the following ::


 db1-statement-1 (short-running)
 db2-statement-1 (very, very long-running)
 db2-statement-2 (short-running)
 db1-statement-2 (short-running)
 db1-statement-3 (short-running)


 We will be grateful if someone could please clarifiy, as to how the
 above statements will be managed amongst the Manager and the
 Worker-Threads (let's say there is just one worker-thread-per-db) ?

 In particular, does the Manager thread creates internal
 slave-relay-log-files, one for per database-statements?



 Thanks and Regards,
 Ajay



-- 
Regards,
Ajay

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query regarding implementation of parallel-replication

2014-09-02 Thread Ajay Garg
Thanks Akshay for the reply.

On Tue, Sep 2, 2014 at 12:25 PM, Akshay Suryavanshi
akshay.suryavansh...@gmail.com wrote:
 Hello Ajay,

 I tried testing the slave-parallel-workers few months ago, what I can surely
 tell you its still under development, and at that time needed some critical
 bug fixing.

 It is helpful in situations where each schema has even workload. The case
 you mentioned above doesnt have so. DB2 is getting different type of load
 than the others, in that case the other slave workers should be able to
 proceed with their workload as opposed to db2 which is still executing the
 long running statement. Now just imagine what happens if we try to take a
 backup, what binlog position should be captured ? the show slave status will
 print what ? this is where it needs development, I tried testing backups on
 it, but there is no concrete documentation on what position it would fetch.

 db2-statement-1 (very, very long-running)
 db2-statement-2 (short-running)

 about the above scenario, the next db2-statement-2 it will wait for the long
 running statement-1 to complete.

Surely.. !! :)


However, my concern is how this tracking is done.
That is, how is the db-wise segregation of statements done (from a
single-binlog-file originally coming onto the slave) ?

If this segregation is not done, then I cannot think of a way on how
things would scale up, like for example, when the slave-relay-log-file
contains a random mix of statements from tens of different databases.



Any pointers on the actual current implementation of this db-wise
statements-segregation will be a great confidence-booster !!  :)



Thanks and Regards,
Ajay


 However db2-statement-2 can be picked up by
 any other sql worker thread.

 This is a good feature added in mysql, however still needs to go through lot
 of testing. Please share your observation and findings in case it differs
 from the above.

 Cheers!!!
 Akshay


 On Mon, Sep 1, 2014 at 8:27 AM, Ajay Garg ajaygargn...@gmail.com wrote:

 Hi all.


 We have replication set-up, where we cater to HUUGEE amounts of data.
 Since quite some time, we have been facing issues wherein the slave
 lags behind master quite a lot.


 So, yesterday we were able to setup parallel replication, by
 incorporating the following changes ::

 a)
 To begin with, partitioned some tables into dedicated databases.

 b)
 Set up the slave-parallel-workers parameter.


 The above seems to work functionally fine, but we have one doubt/query
 about the scalability of this solution.




 First, I will jot down the flow as far as I understand (please correct
 if wrong) ::

 
 Even in parallel-replication scenario, the master writes all the
 binlog (combined for all databases) in just one file, which then gets
 passed onto the slave as single-file itself. Thereafter, all the
 replication commands (combined for all databases) are written
 sequentially onto one slave-relay file.

 Thereafter, as per the documentation, the slave-SQL-Thread acts as the
 manager, handing over commands to worker-threads depending upon the
 databases on which the commands run.
 



 So far, so good.
 However, what would happen if the slave-relay file contains the following
 ::


 db1-statement-1 (short-running)
 db2-statement-1 (very, very long-running)
 db2-statement-2 (short-running)
 db1-statement-2 (short-running)
 db1-statement-3 (short-running)


 We will be grateful if someone could please clarifiy, as to how the
 above statements will be managed amongst the Manager and the
 Worker-Threads (let's say there is just one worker-thread-per-db) ?

 In particular, does the Manager thread creates internal
 slave-relay-log-files, one for per database-statements?



 Thanks and Regards,
 Ajay

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql





-- 
Regards,
Ajay

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query time taken on disk

2014-07-14 Thread Reindl Harald


Am 14.07.2014 12:48, schrieb Satendra:
 Hi there, I'm struggling to find the total time taken by a database query
 on the disk? As I understand when a database query start execution it takes
 some time inside the database engine  some time to seek the result from
 disk (if that is not in cache/buffer)
 
 Can anybody from the group please suggest any clue about the execution time
 on the disk?

mysql can't know this in any useful way

disk can be anyhting, real disk access, VFS and so cache
the application layer knows nothing about



signature.asc
Description: OpenPGP digital signature


Re: Query time taken on disk

2014-07-14 Thread Keith Murphy
Satendra,

Google show profile as it may give you all the information that you need.
There is a lot more details in the performance_schema if you want to dig
into it, but it can be quite difficult to get out.  Here is one place to
start if you want to pursue that angle:
http://www.markleith.co.uk/2011/05/23/monitoring-mysql-io-latency-with-performance_schema/

keith


On Mon, Jul 14, 2014 at 5:59 AM, Reindl Harald h.rei...@thelounge.net
wrote:



 Am 14.07.2014 12:48, schrieb Satendra:
  Hi there, I'm struggling to find the total time taken by a database query
  on the disk? As I understand when a database query start execution it
 takes
  some time inside the database engine  some time to seek the result from
  disk (if that is not in cache/buffer)
 
  Can anybody from the group please suggest any clue about the execution
 time
  on the disk?

 mysql can't know this in any useful way

 disk can be anyhting, real disk access, VFS and so cache
 the application layer knows nothing about




-- 



(c) 850-449-1912
(f)  423-930-8646


Re: Query time taken on disk

2014-07-14 Thread Morgan Tocker
Hi Satendra,

On Jul 14, 2014, at 3:48 AM, Satendra stdra...@gmail.com wrote:

 Hi there, I'm struggling to find the total time taken by a database query
 on the disk? As I understand when a database query start execution it takes
 some time inside the database engine  some time to seek the result from
 disk (if that is not in cache/buffer)
 
 Can anybody from the group please suggest any clue about the execution time
 on the disk?

I have a performance_schema example demonstrating total IO wait time for a 
workload:
http://www.tocker.ca/2014/02/18/todays-practical-use-case-for-performance-schema.html

To prepare this data non-aggregated (per-query) is always a little bit 
difficult:
- With select statements there is read ahead.
- With write statements there is redo logging (which is grouped together with 
other statements).

Maybe someone else on the list has better ideas on how to accommodate this?

- Morgan
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query time taken on disk

2014-07-14 Thread greg . lane

Hi Satendra,


On 7/14/2014 5:48 AM, Satendra wrote:

Hi there, I'm struggling to find the total time taken by a database query
on the disk? As I understand when a database query start execution it takes
some time inside the database engine  some time to seek the result from
disk (if that is not in cache/buffer)

Can anybody from the group please suggest any clue about the execution time
on the disk?

Many thanks
Stdranwl



As stated in other emails you can use the performance_schema.  Mark 
Leith has provided a stable version of ps_helper now called the sys 
schema and can be obtained on github here, 
https://github.com/MarkLeith/mysql-sys .  There is quite a bit of help 
and examples in the README.md.  Also you check into the experimental 
portion of it called dba helper also on github, 
https://github.com/MarkLeith/dbahelper .  I think you will be able to 
find what you are looking for utilizing these tools that make 
performance schema much easier to use and under stand.


Greg

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: Query Help

2013-04-19 Thread Ilya Kazakevich
Hello Richard, 

the count(*) for each week of 2013 so that I end up with:
http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html

Ilya.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query Help

2013-04-19 Thread Larry Martell
On Fri, Apr 19, 2013 at 8:24 AM, Richard Reina gatorre...@gmail.com wrote:
 Hello All,

 Happy Friday! I know how to do the following query:

select count(*) from sales where WEEK(sale_date)=15 AND
 YEAR(sale_date)=2013;

 But can someone tell me I can do a query that will give me:

 the count(*) for each week of 2013 so that I end up with:

 WEEK | COUNT
 1   | 22
 2   | 31
 3   | 29
 etc...

select WEEK(sale_date), count(*) from sales where YEAR(sale_date)=2013
group by WEEK(sale_date);

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query Help

2013-04-19 Thread Richard Reina
Perfect! Thank you Larry et all.

Have a great weekend.


2013/4/19 Larry Martell larry.mart...@gmail.com

 On Fri, Apr 19, 2013 at 8:24 AM, Richard Reina gatorre...@gmail.com
 wrote:
  Hello All,
 
  Happy Friday! I know how to do the following query:
 
 select count(*) from sales where WEEK(sale_date)=15 AND
  YEAR(sale_date)=2013;
 
  But can someone tell me I can do a query that will give me:
 
  the count(*) for each week of 2013 so that I end up with:
 
  WEEK | COUNT
  1   | 22
  2   | 31
  3   | 29
  etc...

 select WEEK(sale_date), count(*) from sales where YEAR(sale_date)=2013
 group by WEEK(sale_date);



Re: Query help -

2013-03-31 Thread william drescher

On 3/31/2013 7:32 AM, william drescher wrote:

I have a table, schedule:
CREATE TABLE `schedule` (
   `schedule_id` mediumint(9) NOT NULL AUTO_INCREMENT,
   `provider` varchar(15) NOT NULL,
   `apptTime` datetime NOT NULL,
   `location` varchar(10) NOT NULL,
   `duration` smallint(5) unsigned NOT NULL,
   `standing_script` mediumint(9) DEFAULT NULL,
   `appt_status` char(1) NOT NULL,
   `patient_number` mediumint(9) NOT NULL,
   `notify` smallint(6) DEFAULT NULL,
   `comment` varchar(80) DEFAULT NULL,
   `history` varchar(200) DEFAULT NULL,
   `posted` tinyint(1) NOT NULL DEFAULT '0',
   PRIMARY KEY (`schedule_id`),
   UNIQUE KEY `patient` (`patient_number`,`apptTime`,`schedule_id`),
   UNIQUE KEY `user` (`user`,`apptTime`,`schedule_id`),
   KEY `standing` (`standing_script`),
   KEY `posted` (`posted`,`user`,`apptTime`)
) ENGINE=InnoDB  DEFAULT CHARSET=ascii;

all of which can be ignored except for 'provider' and apptTime.

I want to query the database and have the result be only the next
appointment for  'patient_number' with each user (the doctor).

eg:
2013-04-04 JSmith
2013-04-20 WJones

where the database contains:
2013-04-04 JSmith
2013-04-10 JSmith
2013-04-17 Jsmith
2013-04-20 WJones
2013-04-24 JSmith
etc

I can get a list of future appointments for 1 patient, but can't
figure out how to just get the first for each provider (there
might be 1..5 providers)

Any suggestions will be appreciated.
--bill


This will be a seldom used query and the schedule database is 
relatively small, so overhead is not a big deal.


--bill



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query help - Solved

2013-03-31 Thread william drescher


of course, Group By

bill


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: Query Resulting error

2013-01-07 Thread Rick James
Don't use DOUBLE (or FLOAT) for monetary values.  Use DECIMAL(...).  Two 
decimal places ( ,2) suffices for most countries, but not all.

DOUBLE values have been rounded when converted from decimal to binary.  
Multiply creates another rounding.  The end result may look the same when 
converted from binary to decimal and displayed to only two decimal places, but 
the underlying values may be !=.

 -Original Message-
 From: Trimurthy [mailto:trimur...@tulassi.com]
 Sent: Saturday, December 29, 2012 4:26 AM
 To: mysql@lists.mysql.com
 Subject: Query Resulting error
 
 hi,
   i am working with mysql 5.1.36 and i wrote the following query.
 
select total,quantity*price from sales where total != quantity*price
 and salesid=122;
 
OUT PUT:
 
totalquatity*price
   330.46   330.46
 
 here quantity is 15.5 and price is 21.32
 
 both the values and data types are double(total,quantity and price).
 can
 anyone tell me why this is happening.
 
  Normal   0   false   false   false  EN-US
 X-NONE   AR-SA
 
 
 
 
 
 
 Thanks  Kind Regards,
 TRIMURTHY
 

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query Resulting error

2012-12-29 Thread Mike O'Krongli
Hi 

To me it looks like quantity is being multiplied by the price and then added to 
total.

Try something like this

SELECT total,(quantity*price) as QP from sales where total !=QP AND salesid=122


On 2012-12-29, at 7:25 AM, Trimurthy wrote:

 hi,
  i am working with mysql 5.1.36 and i wrote the following query.
 
   select total,quantity*price from sales where total != quantity*price and 
 salesid=122;
 
   OUT PUT:
 
   totalquatity*price
  330.46   330.46 
 
 here quantity is 15.5 and price is 21.32
 
 both the values and data types are double(total,quantity and price). can 
 anyone tell me why this is happening.
 
 Normal   0   false   false   false  EN-US   
 X-NONE   AR-SA  
 
 
 
 
 
 
 Thanks  Kind Regards,
 TRIMURTHY
 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: query running very slow, need a little help

2012-12-04 Thread Akshay Suryavanshi
Hi,

A subquery with IN clause is not a good idea. If you want to tune this
query, try adding indexes on the tables accessed in the inner query
credits. A composite index on (success,promoter_id) would be sufficient,
then the optimizer will use this index for the where clause and as a
covering index for column promoter_id.

This should improve performance by orders of magnitude.

Still we would recommend to turn this subquery in a join, which will
perform faster even if there are many records retrieved from the inner
query.

Hope this helps, let us know the results.

Regards,
Akshay S

On Wed, Dec 5, 2012 at 10:24 AM, Paul Nowosielski paulnowosiel...@yahoo.com
 wrote:

 Hi,

 I'm running this query:

 mysql SELECT email FROM promoters where id NOT IN (SELECT promoter_id
 FROM credits WHERE success = 1 ) and active = 1;
 Empty set (31.89 sec)

 its returning an empty set and take over 30 seconds to return.

  mysql describe promoters;
 +---+-**-+--+-**
 +---+-**---+
 | Field | Type | Null | Key | Default
   | Extra  |
 +---+-**-+--+-**
 +---+-**---+
 | id| int(11) unsigned | NO   | PRI | NULL
  | auto_increment |
 | company_name  | varchar(40)  | YES  | | NULL
  ||
 | first_name| varchar(40)  | YES  | | NULL
  ||
 | last_name | varchar(40)  | YES  | | NULL
  ||
 | address   | varchar(40)  | YES  | | NULL
  ||
 | zip   | varchar(10)  | YES  | | NULL
  ||
 | city  | varchar(40)  | YES  | | NULL
  ||
 | country   | varchar(40)  | YES  | | NULL
  ||
 | phone | varchar(20)  | YES  | | NULL
  ||
 | email | varchar(100) | YES  | UNI | NULL
  ||
 | website   | varchar(100) | YES  | | NULL
  ||
 | payments_id   | varchar(10)  | YES  | MUL | NULL
  ||
 | password  | varchar(100) | YES  | | NULL
  ||
 | active| tinyint(1)   | YES  | MUL | NULL
  ||
 | activation_key| varchar(50)  | YES  | | NULL
  ||
 | new_email | varchar(100) | YES  | | NULL
  ||
 | new_email_activation_key  | varchar(50)  | YES  | | NULL
  ||
 | registered| timestamp| YES  | |
 CURRENT_TIMESTAMP ||
 | referral  | int(10) unsigned | YES  | | NULL
  ||
 | whitelabel_beginner_modus | tinyint(1)   | YES  | | 1
   ||
 +---+-**-+--+-**
 +---+-**---+
 20 rows in set (0.00 sec)

 mysql describe credits;
 ++**-+--+-+---**
 ++
 | Field  | Type| Null | Key | Default   |
 Extra  |
 ++**-+--+-+---**
 ++
 | id | int(11) unsigned| NO   | PRI | NULL  |
 auto_increment |
 | type   | tinyint(1) unsigned | NO   | | NULL  |
|
 | credits| int(11) | YES  | | NULL  |
|
 | success| tinyint(1)  | YES  | MUL | NULL  |
|
 | profit | float   | NO   | | NULL  |
|
 | price  | float   | NO   | | NULL  |
|
 | date   | timestamp   | NO   | MUL | CURRENT_TIMESTAMP |
|
 | user_id| int(11) unsigned| NO   | | NULL  |
|
 | promoter_id| int(10) unsigned| YES  | MUL | NULL  |
|
 | referrer   | varchar(10) | YES  | | NULL  |
|
 | domain_id  | int(11) unsigned| NO   | | NULL  |
|
 | string | varchar(100)| YES  | | NULL  |
|
 | client_info| varchar(200)| YES  | | NULL  |
|
 | promoter_paid  | tinyint(1)  | YES  | | NULL  |
|
 | status | tinyint(4)  | YES  | | NULL  |
|
 | seconds| 

RE: Query Plan Analyzer

2012-11-26 Thread Rick James
Yes and no.  It takes a lot of experience to get beyond what EXPLAIN gives us.  
EXPLAIN says what it does, but usually one wants to know what to do.  To that 
end, I have written some tips:
http://mysql.rjweb.org/doc.php/ricksrots
http://mysql.rjweb.org/doc.php/index1
But if you have one of the classic problems, you need to think out of the 
box.  Examples:  EAV, Pagination, Big deletes, nearest pizza parlors, lat/long 
searching, latest news articles, UUID indexing, picking a random row.  The 
above links lead to other pages that discuss efficient approaches to those 
gnarly tasks.

I find Profiling to be useless, since it seems to always have 95% of the time 
is in copying; this says nothing actionable.

EXPLAIN is useful in confirming that the query will be executed in the way you 
think it ought to be -- Using this index or that, hitting the tables in a 
particular order, Using index (meaning that the data did not need to be 
touched), Using join buffer (often bad).  I repeatedly tell people that 
using filesort/temp-table is not necessarily evil; think what it must take to 
do
   GROUP BY z ORDER BY x

If you want an analysis of a particular query, please provide
SHOW CREATE TABLE -- for engine, datatypes, and indexes
SHOW TABLE STATUS -- for size
EXPLAIN SELECT -- for what the optimizer's plan
EXPLAIN PARTITIONS SELECT -- if partitioned.

Oh, another note...  Akiban's EXPLAIN is far more detailed than MySQL's.  (But 
then, it is doing some tricky things.)


 -Original Message-
 From: Cabbar Duzayak [mailto:cab...@gmail.com]
 Sent: Thursday, November 22, 2012 12:45 AM
 To: mysql@lists.mysql.com
 Subject: Re: Query Plan Analyzer
 
 Is this a joke?
 
 On Thu, Nov 22, 2012 at 10:20 AM, Zhigang Zhang
 zzgang2...@gmail.comwrote:
 
  By experience!
 
  -Original Message-
  From: Cabbar Duzayak [mailto:cab...@gmail.com]
  Sent: Thursday, November 22, 2012 3:13 PM
  To: mysql@lists.mysql.com
  Subject: Query Plan Analyzer
 
  Hi All,
 
  Is there a way of looking at how mysql builds the query plan and
  executes it for a given query? EXPLAIN is definitely a useful tool,
  but it is not exact (shows approximations as far as I can see),
  furthermore I want something like how the predicates were applied
  (before or after JOINS), which indexes were used, etc.
 
  Basically, Oracle's EXPLAIN PLAN and SQL TRACE do this, and it is
 very
  very useful for optimizing queries. Don't get me wrong, I love mysql,
  but with mysql, you optimize queries more by trial and error, instead
  of understanding what exactly is going on.
 
  Thanks a ton.
 
 

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: Query Plan Analyzer

2012-11-22 Thread Zhigang Zhang
By experience!

-Original Message-
From: Cabbar Duzayak [mailto:cab...@gmail.com] 
Sent: Thursday, November 22, 2012 3:13 PM
To: mysql@lists.mysql.com
Subject: Query Plan Analyzer

Hi All,

Is there a way of looking at how mysql builds the query plan and executes
it for a given query? EXPLAIN is definitely a useful tool, but it is not
exact (shows approximations as far as I can see), furthermore I want
something like how the predicates were applied (before or after JOINS),
which indexes were used, etc.

Basically, Oracle's EXPLAIN PLAN and SQL TRACE do this, and it is very very
useful for optimizing queries. Don't get me wrong, I love mysql, but with
mysql, you optimize queries more by trial and error, instead of
understanding what exactly is going on.

Thanks a ton.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query Plan Analyzer

2012-11-22 Thread Cabbar Duzayak
Is this a joke?

On Thu, Nov 22, 2012 at 10:20 AM, Zhigang Zhang zzgang2...@gmail.comwrote:

 By experience!

 -Original Message-
 From: Cabbar Duzayak [mailto:cab...@gmail.com]
 Sent: Thursday, November 22, 2012 3:13 PM
 To: mysql@lists.mysql.com
 Subject: Query Plan Analyzer

 Hi All,

 Is there a way of looking at how mysql builds the query plan and executes
 it for a given query? EXPLAIN is definitely a useful tool, but it is not
 exact (shows approximations as far as I can see), furthermore I want
 something like how the predicates were applied (before or after JOINS),
 which indexes were used, etc.

 Basically, Oracle's EXPLAIN PLAN and SQL TRACE do this, and it is very very
 useful for optimizing queries. Don't get me wrong, I love mysql, but with
 mysql, you optimize queries more by trial and error, instead of
 understanding what exactly is going on.

 Thanks a ton.




Re: Query Plan Analyzer

2012-11-22 Thread Johan De Meersman
- Original Message -
 From: Cabbar Duzayak cab...@gmail.com
 
 Is there a way of looking at how mysql builds the query plan and
 executes it for a given query? EXPLAIN is definitely a useful tool, but it is
 not exact (shows approximations as far as I can see), furthermore I want
 something like how the predicates were applied (before or after
 JOINS), which indexes were used, etc.
 
 Basically, Oracle's EXPLAIN PLAN and SQL TRACE do this, and it is
 very very useful for optimizing queries. Don't get me wrong, I love mysql, but
 with mysql, you optimize queries more by trial and error, instead of
 understanding what exactly is going on.

The Percona Toolset has pt-visual-explain and some other useful tools.

There's also SHOW PROFILE in mysql itself, look at 
http://dev.mysql.com/doc/refman/5.0/en/show-profiles.html for more about that.

All in all, though, I'm not sure you can go quite as deep as you can in Oracle.


-- 
Linux Bier Wanderung 2012, now also available in Belgium!
August, 12 to 19, Diksmuide, Belgium - http://lbw2012.tuxera.be

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query Optimization

2012-11-16 Thread Benaya Paul
SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); query will make it
faster, if the field is ENUM


On Fri, Nov 16, 2012 at 12:36 AM, Anupam Karmarkar
sb_akarmar...@yahoo.comwrote:

 Hi All,

 Consider a scenario, I have table XYZ which contains value follow
 BLUE
 RED
 GREEN
 NULL

 following are queries we can use get this values

 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN');
 2. SELECT * FROM XYZ WHERE VAL IS NOT NULL
 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR VAL='RED' OR VAL='GREEN'
 and more

 So which one is good in terms of optimization. I guess, 1 and 3 are
 similar in term of formation.


 --Anupam




-- 
Thanks  Regards,
P.Benaya Paul

http://www.codeasearch.com
http://www.iwannasearch.com


RE: Query Optimization

2012-11-16 Thread Rick James
It depends on the distribution of the 4 'values' in that field.
If the cardinality is poor, then INDEX(VAL) won't be used, and they will all do 
a table scan.

 -Original Message-
 From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com]
 Sent: Friday, November 16, 2012 12:36 AM
 To: mysql@lists.mysql.com
 Subject: Query Optimization
 
 Hi All,
 
 Consider a scenario, I have table XYZ which contains value follow BLUE RED
 GREEN NULL
 
 following are queries we can use get this values
 
 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); 2. SELECT * FROM
 XYZ WHERE VAL IS NOT NULL 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR
 VAL='RED' OR VAL='GREEN'
 and more
 
 So which one is good in terms of optimization. I guess, 1 and 3 are similar
 in term of formation.
 
 
 --Anupam

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: query tuning

2012-11-15 Thread James W. McNeely
Rick,

This is incredibly useful, thanks for taking the time to share with me. I am 
really learning stuff here!

Thanks,

Jim

On Nov 14, 2012, at 5:12 PM, Rick James wrote:

 A PRIMARY KEY is a KEY, so the second of these is redundant:
  PRIMARY KEY (`zzk`),
  KEY `zzk` (`zzk`),
 
 Compound indexes are your friend.  Learn from
  http://mysql.rjweb.org/doc.php/index1
 
  `zzr_StatusTime` mediumtext,
  `zzr_StatusDate` mediumtext,
 Don't need 16MB for a date or time.
 Combining the fields is usually a better idea.
 
 VARCHAR(255), and especially TEXT/MEDIUMTEXT, can hurt performance when temp 
 tables needed in a query -- use something sensible.
  `NameLastFirstMI` varchar(255) DEFAULT NULL,
 1 letter?  Occupying up to 767 bytes (255 utf8 chars)!
 
 InnoDB is generally faster than MyISAM.
 
  KEY `isdigitized` (`IsDigitized`),
 Rarely is a 'flag' (yes/no) value worth indexing.
  `is_deleted` int(11) DEFAULT NULL,
 If it is just a 0/1 flag, then consider TINYINT UNSIGNED NOT NULL -- it will 
 shrink the space for that field significantly.
 
  `Q01_answer` text,
  `Q01_title` text,
  `Q02_answer` text,
  `Q02_title` text,
 Generally better to have another table for arrays; in this case it might 
 have 4 columns:
 Appt_zzk, question_number, answer, title.
 and multiple rows for each Appt.
 
 ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='InnoDB free: 4068352 kB'
 Sounds confused about the engine??
 
 Most fields are NULLable.  Not a serious issue, but not good practice.
 
 As you add
  INDEX(ProcModeCode, dateexam)
 you can DROP this as being redundant:
  KEY `ProcModeCode` (`ProcModeCode`)
 
 The quick fix is to try the index suggested.  It should change from
 1 SIMPLE  e   index   dateexam,IdAppt,ProcModeCodePRIMARY 4  
  NULL1121043 Using where
 Note the huge estimated number of rows.  That should decline significantly.
 
 What's the value of key_buffer_size?  It should generally be 20% of 
 _available_ RAM (for MyISAM usage).
 
 -Original Message-
 From: James W. McNeely [mailto:jmcne...@nwrads.com]
 Sent: Wednesday, November 14, 2012 3:48 PM
 To: mysql@lists.mysql.com
 Subject: Re: query tuning
 
 Rick,
 
 Thanks! Here is the create table info. Sorry for the size, but you can skip
 down from all the fields to see the indexes.
 
 CREATE TABLE `exams` (
  `zAddDate` date DEFAULT NULL,
  `zModDate` date DEFAULT NULL,
  `IdPatient` varchar(32) DEFAULT NULL,
  `zModBy` varchar(255) DEFAULT NULL,
  `Notes` mediumtext,
  `TechComments` mediumtext,
  `NameLastFirstMI` varchar(255) DEFAULT NULL,
  `IdExam` varchar(32) DEFAULT NULL,
  `IdAppt` varchar(255) DEFAULT NULL,
  `IdProcedure` varchar(255) DEFAULT NULL,
  `ProcName` mediumtext,
  `zzr_StatusBy` mediumtext,
  `zzr_StatusTime` mediumtext,
  `zzr_StatusDate` mediumtext,
  `zAddBy` varchar(255) DEFAULT NULL,
  `zMasterUserTest` varchar(255) DEFAULT NULL,
  `ProcAreaCode` mediumtext,
  `ClinicalICD9_1` mediumtext,
  `ICD9Text` mediumtext,
  `FilmsStandard` mediumtext,
  `FilmsAlternate` mediumtext,
  `OutsideSource` mediumtext,
  `ProcCode` mediumtext,
  `ClinicalText` mediumtext,
  `OutsideStatus` mediumtext,
  `OutsideDate` varchar(255) DEFAULT NULL,
  `ProcModeCode` varchar(16) DEFAULT NULL,
  `DateOfBirth` varchar(255) DEFAULT NULL,
  `PathDiagnosis` mediumtext,
  `PathSize` mediumtext,
  `PathBiopsy` mediumtext,
  `PathBiopsyDate` varchar(255) DEFAULT NULL,
  `ClinicalICD9_2` mediumtext,
  `ClinicalICD9_3` mediumtext,
  `RefPractInstructions` mediumtext,
  `_External` mediumtext,
  `FindCode` mediumtext,
  `NWRaccession` varchar(255) DEFAULT NULL,
  `Gender` varchar(255) DEFAULT NULL,
  `IdOverReadFacility` varchar(255) DEFAULT NULL,
  `ProcDorS` mediumtext,
  `CompareToDate` varchar(255) DEFAULT NULL,
  `IdRefSite` varchar(255) DEFAULT NULL,
  `IsBillable` varchar(255) DEFAULT NULL,
  `LastWordProcRt` mediumtext,
  `IdPerformedBy` varchar(255) DEFAULT NULL,
  `ContrastRiskFactors` mediumtext,
  `TimeToCompletion` varchar(255) DEFAULT NULL,
  `ContrastDose` mediumtext,
  `OutsideSourceOther` mediumtext,
  `SiteSide` mediumtext,
  `SiteLocation` mediumtext,
  `OverReadExamNumber` varchar(255) DEFAULT NULL,
  `IsOverRead` varchar(255) DEFAULT NULL,
  `OverReadCharge` varchar(255) DEFAULT NULL,
  `IsBillIns` varchar(255) DEFAULT NULL,
  `LastWordProcLt` mediumtext,
  `IsRLB` varchar(255) DEFAULT NULL,
  `IsBCHP` varchar(255) DEFAULT NULL,
  `IsNWRCoordUSOverride` varchar(255) DEFAULT NULL,
  `IsTechOnly` varchar(255) DEFAULT NULL,
  `IsProcNotComplete` varchar(255) DEFAULT NULL,
  `IsWaiverSigned` varchar(255) DEFAULT NULL,
  `IdInsur1` varchar(255) DEFAULT NULL,
  `IdInsur2` varchar(255) DEFAULT NULL,
  `IsNoInsurance` varchar(255) DEFAULT NULL,
  `IsPrintBillInfo` varchar(255) DEFAULT NULL,
  `OverReadChargePrinted` varchar(255) DEFAULT NULL,
  `RefSiteOther` varchar(255) DEFAULT NULL,
  `IsOverReadBillIns Copy` varchar(255) DEFAULT NULL,
  `ExamRoom` varchar(255) DEFAULT NULL,
  `IdProtocol` varchar(255) DEFAULT NULL

RE: query tuning

2012-11-15 Thread Iñigo Medina


Good advices indeed. Thanks for sharing.

iñ

On Wed, 14 Nov 2012, Rick James wrote:


A PRIMARY KEY is a KEY, so the second of these is redundant:

  PRIMARY KEY (`zzk`),
  KEY `zzk` (`zzk`),


Compound indexes are your friend.  Learn from
 http://mysql.rjweb.org/doc.php/index1


  `zzr_StatusTime` mediumtext,
  `zzr_StatusDate` mediumtext,

Don't need 16MB for a date or time.
Combining the fields is usually a better idea.

VARCHAR(255), and especially TEXT/MEDIUMTEXT, can hurt performance when temp 
tables needed in a query -- use something sensible.

  `NameLastFirstMI` varchar(255) DEFAULT NULL,

1 letter?  Occupying up to 767 bytes (255 utf8 chars)!

InnoDB is generally faster than MyISAM.


  KEY `isdigitized` (`IsDigitized`),

Rarely is a 'flag' (yes/no) value worth indexing.

  `is_deleted` int(11) DEFAULT NULL,

If it is just a 0/1 flag, then consider TINYINT UNSIGNED NOT NULL -- it will 
shrink the space for that field significantly.


  `Q01_answer` text,
  `Q01_title` text,
  `Q02_answer` text,
  `Q02_title` text,

Generally better to have another table for arrays; in this case it might have 
4 columns:
Appt_zzk, question_number, answer, title.
and multiple rows for each Appt.


) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='InnoDB free: 4068352 kB'

Sounds confused about the engine??

Most fields are NULLable.  Not a serious issue, but not good practice.

As you add
 INDEX(ProcModeCode, dateexam)
you can DROP this as being redundant:
 KEY `ProcModeCode` (`ProcModeCode`)

The quick fix is to try the index suggested.  It should change from

1 SIMPLE  e   index   dateexam,IdAppt,ProcModeCodePRIMARY 4   NULL
1121043 Using where

Note the huge estimated number of rows.  That should decline significantly.

What's the value of key_buffer_size?  It should generally be 20% of _available_ 
RAM (for MyISAM usage).


-Original Message-
From: James W. McNeely [mailto:jmcne...@nwrads.com]
Sent: Wednesday, November 14, 2012 3:48 PM
To: mysql@lists.mysql.com
Subject: Re: query tuning

Rick,

Thanks! Here is the create table info. Sorry for the size, but you can skip
down from all the fields to see the indexes.

CREATE TABLE `exams` (
  `zAddDate` date DEFAULT NULL,
  `zModDate` date DEFAULT NULL,
  `IdPatient` varchar(32) DEFAULT NULL,
  `zModBy` varchar(255) DEFAULT NULL,
  `Notes` mediumtext,
  `TechComments` mediumtext,
  `NameLastFirstMI` varchar(255) DEFAULT NULL,
  `IdExam` varchar(32) DEFAULT NULL,
  `IdAppt` varchar(255) DEFAULT NULL,
  `IdProcedure` varchar(255) DEFAULT NULL,
  `ProcName` mediumtext,
  `zzr_StatusBy` mediumtext,
  `zzr_StatusTime` mediumtext,
  `zzr_StatusDate` mediumtext,
  `zAddBy` varchar(255) DEFAULT NULL,
  `zMasterUserTest` varchar(255) DEFAULT NULL,
  `ProcAreaCode` mediumtext,
  `ClinicalICD9_1` mediumtext,
  `ICD9Text` mediumtext,
  `FilmsStandard` mediumtext,
  `FilmsAlternate` mediumtext,
  `OutsideSource` mediumtext,
  `ProcCode` mediumtext,
  `ClinicalText` mediumtext,
  `OutsideStatus` mediumtext,
  `OutsideDate` varchar(255) DEFAULT NULL,
  `ProcModeCode` varchar(16) DEFAULT NULL,
  `DateOfBirth` varchar(255) DEFAULT NULL,
  `PathDiagnosis` mediumtext,
  `PathSize` mediumtext,
  `PathBiopsy` mediumtext,
  `PathBiopsyDate` varchar(255) DEFAULT NULL,
  `ClinicalICD9_2` mediumtext,
  `ClinicalICD9_3` mediumtext,
  `RefPractInstructions` mediumtext,
  `_External` mediumtext,
  `FindCode` mediumtext,
  `NWRaccession` varchar(255) DEFAULT NULL,
  `Gender` varchar(255) DEFAULT NULL,
  `IdOverReadFacility` varchar(255) DEFAULT NULL,
  `ProcDorS` mediumtext,
  `CompareToDate` varchar(255) DEFAULT NULL,
  `IdRefSite` varchar(255) DEFAULT NULL,
  `IsBillable` varchar(255) DEFAULT NULL,
  `LastWordProcRt` mediumtext,
  `IdPerformedBy` varchar(255) DEFAULT NULL,
  `ContrastRiskFactors` mediumtext,
  `TimeToCompletion` varchar(255) DEFAULT NULL,
  `ContrastDose` mediumtext,
  `OutsideSourceOther` mediumtext,
  `SiteSide` mediumtext,
  `SiteLocation` mediumtext,
  `OverReadExamNumber` varchar(255) DEFAULT NULL,
  `IsOverRead` varchar(255) DEFAULT NULL,
  `OverReadCharge` varchar(255) DEFAULT NULL,
  `IsBillIns` varchar(255) DEFAULT NULL,
  `LastWordProcLt` mediumtext,
  `IsRLB` varchar(255) DEFAULT NULL,
  `IsBCHP` varchar(255) DEFAULT NULL,
  `IsNWRCoordUSOverride` varchar(255) DEFAULT NULL,
  `IsTechOnly` varchar(255) DEFAULT NULL,
  `IsProcNotComplete` varchar(255) DEFAULT NULL,
  `IsWaiverSigned` varchar(255) DEFAULT NULL,
  `IdInsur1` varchar(255) DEFAULT NULL,
  `IdInsur2` varchar(255) DEFAULT NULL,
  `IsNoInsurance` varchar(255) DEFAULT NULL,
  `IsPrintBillInfo` varchar(255) DEFAULT NULL,
  `OverReadChargePrinted` varchar(255) DEFAULT NULL,
  `RefSiteOther` varchar(255) DEFAULT NULL,
  `IsOverReadBillIns Copy` varchar(255) DEFAULT NULL,
  `ExamRoom` varchar(255) DEFAULT NULL,
  `IdProtocol` varchar(255) DEFAULT NULL,
  `IsTransDelayed` varchar(255) DEFAULT NULL,
  `Patient Shielded` varchar(255) DEFAULT NULL,
  `Patient Permitted` varchar(255

RE: query tuning

2012-11-14 Thread Rick James
Can't help without the SHOW CREATE TABLEs.

Perhaps e needs one of these:
INDEX(zzk)
INDEX(ProcModeCode, dateexam)  -- in that order
(I can't predict which index it would use.)

Are IdAppt the same datatype and collation in each table?

 -Original Message-
 From: James W. McNeely [mailto:jmcne...@nwrads.com]
 Sent: Wednesday, November 14, 2012 2:34 PM
 To: mysql@lists.mysql.com
 Subject: query tuning
 
 I have a query I'm having trouble with. If do this query which is DATE plus
 ProcModecode, it is very fast:
 
 SELECT e.zzk FROM exams e
 -- JOIN Appt_ a ON e.IdAppt = a.IdAppt
 -- JOIN Affil_ af ON a.IdAffil_primary = af.IdAffil WHERE e.dateexam =
 '2012-09-01' AND e.dateexam = '2012-09-30'
 AND e.ProcModeCode = 'P'
 
 Notice that the joins are commented out.
 
 If I do this query, which doesn't have the ProcModeCode, but does have the
 IdPract across the join, it is also fast, but not as fast:
 
 SELECT e.zzk
 FROM exams e
 JOIN Appt_ a ON e.IdAppt = a.IdAppt
 JOIN Affil_ af ON a.IdAffil_primary = af.IdAffil WHERE e.dateexam = '2012-
 09-01' AND e.dateexam = '2012-09-30'
 AND af.IdPract = 'D00400'
 
 BUT, if I do this, with the ProcModeCode AND the IdPract, it is so slow I
 have to kill the query:
 
 SELECT e.zzk
 FROM exams e
 JOIN Appt_ a ON e.IdAppt = a.IdAppt
 JOIN Affil_ af ON a.IdAffil_primary = af.IdAffil WHERE e.dateexam = '2012-
 09-01' AND e.dateexam = '2012-09-30' AND af.IdPract = 'D00400' AND
 e.ProcModeCode = 'L'
 ORDER BY e.zzk DESC LIMIT 30
 
 Here is the result of an explain on this:
 
 idselect_type table   typepossible_keys   key key_len ref
   rowsExtra
 1 SIMPLE  e   index   dateexam,IdAppt,ProcModeCodePRIMARY 4
   NULL1121043 Using where
 1 SIMPLE  af  ref IdAffil,IdPract IdPract 51  const   1
   Using where
 1 SIMPLE  a   ref IdAppt  IdAppt  99  RIS_archive.e.IdAppt
 1
   Using where
 
 Any ideas about how I can fix this?
 
 TIA
 
 Jim McNeely
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: query tuning

2012-11-14 Thread James W. McNeely
Rick,

Thanks! Here is the create table info. Sorry for the size, but you can skip 
down from all the fields to see the indexes.

CREATE TABLE `exams` (
  `zAddDate` date DEFAULT NULL,
  `zModDate` date DEFAULT NULL,
  `IdPatient` varchar(32) DEFAULT NULL,
  `zModBy` varchar(255) DEFAULT NULL,
  `Notes` mediumtext,
  `TechComments` mediumtext,
  `NameLastFirstMI` varchar(255) DEFAULT NULL,
  `IdExam` varchar(32) DEFAULT NULL,
  `IdAppt` varchar(255) DEFAULT NULL,
  `IdProcedure` varchar(255) DEFAULT NULL,
  `ProcName` mediumtext,
  `zzr_StatusBy` mediumtext,
  `zzr_StatusTime` mediumtext,
  `zzr_StatusDate` mediumtext,
  `zAddBy` varchar(255) DEFAULT NULL,
  `zMasterUserTest` varchar(255) DEFAULT NULL,
  `ProcAreaCode` mediumtext,
  `ClinicalICD9_1` mediumtext,
  `ICD9Text` mediumtext,
  `FilmsStandard` mediumtext,
  `FilmsAlternate` mediumtext,
  `OutsideSource` mediumtext,
  `ProcCode` mediumtext,
  `ClinicalText` mediumtext,
  `OutsideStatus` mediumtext,
  `OutsideDate` varchar(255) DEFAULT NULL,
  `ProcModeCode` varchar(16) DEFAULT NULL,
  `DateOfBirth` varchar(255) DEFAULT NULL,
  `PathDiagnosis` mediumtext,
  `PathSize` mediumtext,
  `PathBiopsy` mediumtext,
  `PathBiopsyDate` varchar(255) DEFAULT NULL,
  `ClinicalICD9_2` mediumtext,
  `ClinicalICD9_3` mediumtext,
  `RefPractInstructions` mediumtext,
  `_External` mediumtext,
  `FindCode` mediumtext,
  `NWRaccession` varchar(255) DEFAULT NULL,
  `Gender` varchar(255) DEFAULT NULL,
  `IdOverReadFacility` varchar(255) DEFAULT NULL,
  `ProcDorS` mediumtext,
  `CompareToDate` varchar(255) DEFAULT NULL,
  `IdRefSite` varchar(255) DEFAULT NULL,
  `IsBillable` varchar(255) DEFAULT NULL,
  `LastWordProcRt` mediumtext,
  `IdPerformedBy` varchar(255) DEFAULT NULL,
  `ContrastRiskFactors` mediumtext,
  `TimeToCompletion` varchar(255) DEFAULT NULL,
  `ContrastDose` mediumtext,
  `OutsideSourceOther` mediumtext,
  `SiteSide` mediumtext,
  `SiteLocation` mediumtext,
  `OverReadExamNumber` varchar(255) DEFAULT NULL,
  `IsOverRead` varchar(255) DEFAULT NULL,
  `OverReadCharge` varchar(255) DEFAULT NULL,
  `IsBillIns` varchar(255) DEFAULT NULL,
  `LastWordProcLt` mediumtext,
  `IsRLB` varchar(255) DEFAULT NULL,
  `IsBCHP` varchar(255) DEFAULT NULL,
  `IsNWRCoordUSOverride` varchar(255) DEFAULT NULL,
  `IsTechOnly` varchar(255) DEFAULT NULL,
  `IsProcNotComplete` varchar(255) DEFAULT NULL,
  `IsWaiverSigned` varchar(255) DEFAULT NULL,
  `IdInsur1` varchar(255) DEFAULT NULL,
  `IdInsur2` varchar(255) DEFAULT NULL,
  `IsNoInsurance` varchar(255) DEFAULT NULL,
  `IsPrintBillInfo` varchar(255) DEFAULT NULL,
  `OverReadChargePrinted` varchar(255) DEFAULT NULL,
  `RefSiteOther` varchar(255) DEFAULT NULL,
  `IsOverReadBillIns Copy` varchar(255) DEFAULT NULL,
  `ExamRoom` varchar(255) DEFAULT NULL,
  `IdProtocol` varchar(255) DEFAULT NULL,
  `IsTransDelayed` varchar(255) DEFAULT NULL,
  `Patient Shielded` varchar(255) DEFAULT NULL,
  `Patient Permitted` varchar(255) DEFAULT NULL,
  `LMP Status` varchar(255) DEFAULT NULL,
  `Flouro Time` varchar(255) DEFAULT NULL,
  `IsWetread` varchar(255) DEFAULT NULL,
  `zFlag` varchar(255) DEFAULT NULL,
  `FindingFinal` mediumtext,
  `IdExamsPrior` mediumtext,
  `ExamDelayComment` mediumtext,
  `NotesBillingAppt` mediumtext,
  `ExportBatch` varchar(255) DEFAULT NULL,
  `AgeRange` varchar(255) DEFAULT NULL,
  `Exam Start` varchar(255) DEFAULT NULL,
  `Exam Finish` varchar(255) DEFAULT NULL,
  `IsNWRBillLeaseBack` varchar(255) DEFAULT NULL,
  `MRIExamIndicated` varchar(255) DEFAULT NULL,
  `MRIExamNegPos` varchar(255) DEFAULT NULL,
  `zContinueFlag` varchar(255) DEFAULT NULL,
  `IsLeaseBack` varchar(255) DEFAULT NULL,
  `ContrastConcentration` mediumtext,
  `NotesBillingExam` mediumtext,
  `LastWordAccNum` varchar(255) DEFAULT NULL,
  `LastWordExportDate` varchar(255) DEFAULT NULL,
  `LastWordAcctNum` varchar(255) DEFAULT NULL,
  `LastWordExportFlag` varchar(255) DEFAULT NULL,
  `LeaseBackOverageKey` varchar(255) DEFAULT NULL,
  `LastWordPrePost` varchar(255) DEFAULT NULL,
  `LastWordExportBatch` varchar(255) DEFAULT NULL,
  `IsFilmed` varchar(255) DEFAULT NULL,
  `zCreateTime` varchar(255) DEFAULT NULL,
  `zzr_OperICDPre` mediumtext,
  `zzr_OperICDPost` mediumtext,
  `ClinicalTextPostOp` mediumtext,
  `zzr_OperLocation` mediumtext,
  `OperTemp` varchar(255) DEFAULT NULL,
  `OperPulse` varchar(255) DEFAULT NULL,
  `OperBloodPressure` varchar(255) DEFAULT NULL,
  `OperAirSaturation` varchar(255) DEFAULT NULL,
  `zDateExamAnnual` varchar(255) DEFAULT NULL,
  `OperPainLevel` varchar(255) DEFAULT NULL,
  `zzr_OperSide` mediumtext,
  `Surgeon` varchar(255) DEFAULT NULL,
  `OperRespiration` varchar(255) DEFAULT NULL,
  `OperSeriesEnd` varchar(255) DEFAULT NULL,
  `OperSeriesNumber` varchar(255) DEFAULT NULL,
  `zModTimestamp` datetime DEFAULT NULL,
  `IdRelatedProcedure` varchar(255) DEFAULT NULL,
  `ExamFailed` varchar(255) DEFAULT NULL,
  `IdSurgeon` varchar(255) DEFAULT NULL,
  `IsDigitized` varchar(255) DEFAULT NULL,
  `CorrelatingExam` 

RE: query tuning

2012-11-14 Thread Rick James
A PRIMARY KEY is a KEY, so the second of these is redundant:
   PRIMARY KEY (`zzk`),
   KEY `zzk` (`zzk`),

Compound indexes are your friend.  Learn from
  http://mysql.rjweb.org/doc.php/index1

   `zzr_StatusTime` mediumtext,
   `zzr_StatusDate` mediumtext,
Don't need 16MB for a date or time.
Combining the fields is usually a better idea.

VARCHAR(255), and especially TEXT/MEDIUMTEXT, can hurt performance when temp 
tables needed in a query -- use something sensible.
   `NameLastFirstMI` varchar(255) DEFAULT NULL,
1 letter?  Occupying up to 767 bytes (255 utf8 chars)!

InnoDB is generally faster than MyISAM.

   KEY `isdigitized` (`IsDigitized`),
Rarely is a 'flag' (yes/no) value worth indexing.
   `is_deleted` int(11) DEFAULT NULL,
If it is just a 0/1 flag, then consider TINYINT UNSIGNED NOT NULL -- it will 
shrink the space for that field significantly.

   `Q01_answer` text,
   `Q01_title` text,
   `Q02_answer` text,
   `Q02_title` text,
Generally better to have another table for arrays; in this case it might have 
4 columns:
Appt_zzk, question_number, answer, title.
and multiple rows for each Appt.

 ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='InnoDB free: 4068352 kB'
Sounds confused about the engine??

Most fields are NULLable.  Not a serious issue, but not good practice.

As you add
  INDEX(ProcModeCode, dateexam)
you can DROP this as being redundant:
  KEY `ProcModeCode` (`ProcModeCode`)

The quick fix is to try the index suggested.  It should change from
 1 SIMPLE  e   index   dateexam,IdAppt,ProcModeCodePRIMARY 4   
 NULL1121043 Using where
Note the huge estimated number of rows.  That should decline significantly.

What's the value of key_buffer_size?  It should generally be 20% of _available_ 
RAM (for MyISAM usage).

 -Original Message-
 From: James W. McNeely [mailto:jmcne...@nwrads.com]
 Sent: Wednesday, November 14, 2012 3:48 PM
 To: mysql@lists.mysql.com
 Subject: Re: query tuning

 Rick,

 Thanks! Here is the create table info. Sorry for the size, but you can skip
 down from all the fields to see the indexes.

 CREATE TABLE `exams` (
   `zAddDate` date DEFAULT NULL,
   `zModDate` date DEFAULT NULL,
   `IdPatient` varchar(32) DEFAULT NULL,
   `zModBy` varchar(255) DEFAULT NULL,
   `Notes` mediumtext,
   `TechComments` mediumtext,
   `NameLastFirstMI` varchar(255) DEFAULT NULL,
   `IdExam` varchar(32) DEFAULT NULL,
   `IdAppt` varchar(255) DEFAULT NULL,
   `IdProcedure` varchar(255) DEFAULT NULL,
   `ProcName` mediumtext,
   `zzr_StatusBy` mediumtext,
   `zzr_StatusTime` mediumtext,
   `zzr_StatusDate` mediumtext,
   `zAddBy` varchar(255) DEFAULT NULL,
   `zMasterUserTest` varchar(255) DEFAULT NULL,
   `ProcAreaCode` mediumtext,
   `ClinicalICD9_1` mediumtext,
   `ICD9Text` mediumtext,
   `FilmsStandard` mediumtext,
   `FilmsAlternate` mediumtext,
   `OutsideSource` mediumtext,
   `ProcCode` mediumtext,
   `ClinicalText` mediumtext,
   `OutsideStatus` mediumtext,
   `OutsideDate` varchar(255) DEFAULT NULL,
   `ProcModeCode` varchar(16) DEFAULT NULL,
   `DateOfBirth` varchar(255) DEFAULT NULL,
   `PathDiagnosis` mediumtext,
   `PathSize` mediumtext,
   `PathBiopsy` mediumtext,
   `PathBiopsyDate` varchar(255) DEFAULT NULL,
   `ClinicalICD9_2` mediumtext,
   `ClinicalICD9_3` mediumtext,
   `RefPractInstructions` mediumtext,
   `_External` mediumtext,
   `FindCode` mediumtext,
   `NWRaccession` varchar(255) DEFAULT NULL,
   `Gender` varchar(255) DEFAULT NULL,
   `IdOverReadFacility` varchar(255) DEFAULT NULL,
   `ProcDorS` mediumtext,
   `CompareToDate` varchar(255) DEFAULT NULL,
   `IdRefSite` varchar(255) DEFAULT NULL,
   `IsBillable` varchar(255) DEFAULT NULL,
   `LastWordProcRt` mediumtext,
   `IdPerformedBy` varchar(255) DEFAULT NULL,
   `ContrastRiskFactors` mediumtext,
   `TimeToCompletion` varchar(255) DEFAULT NULL,
   `ContrastDose` mediumtext,
   `OutsideSourceOther` mediumtext,
   `SiteSide` mediumtext,
   `SiteLocation` mediumtext,
   `OverReadExamNumber` varchar(255) DEFAULT NULL,
   `IsOverRead` varchar(255) DEFAULT NULL,
   `OverReadCharge` varchar(255) DEFAULT NULL,
   `IsBillIns` varchar(255) DEFAULT NULL,
   `LastWordProcLt` mediumtext,
   `IsRLB` varchar(255) DEFAULT NULL,
   `IsBCHP` varchar(255) DEFAULT NULL,
   `IsNWRCoordUSOverride` varchar(255) DEFAULT NULL,
   `IsTechOnly` varchar(255) DEFAULT NULL,
   `IsProcNotComplete` varchar(255) DEFAULT NULL,
   `IsWaiverSigned` varchar(255) DEFAULT NULL,
   `IdInsur1` varchar(255) DEFAULT NULL,
   `IdInsur2` varchar(255) DEFAULT NULL,
   `IsNoInsurance` varchar(255) DEFAULT NULL,
   `IsPrintBillInfo` varchar(255) DEFAULT NULL,
   `OverReadChargePrinted` varchar(255) DEFAULT NULL,
   `RefSiteOther` varchar(255) DEFAULT NULL,
   `IsOverReadBillIns Copy` varchar(255) DEFAULT NULL,
   `ExamRoom` varchar(255) DEFAULT NULL,
   `IdProtocol` varchar(255) DEFAULT NULL,
   `IsTransDelayed` varchar(255) DEFAULT NULL,
   `Patient Shielded` varchar(255) DEFAULT NULL,
   `Patient Permitted` varchar(255) DEFAULT

Re: Query Cache Crashing

2012-10-10 Thread Ben Clewett

Hi Rick,

Thanks for the advise.  I have now set my query-cache to zero.

(I take your point about query cache too large.  I understand that a 
smaller cache size, and the use of the SQL_NO_CACHE and SQL_CACHE 
directives can be used to control which queries are cached.  Therefore 
trying to get the advantage without the disadvantage.  But this is a lot 
of work to change every query we ever run!)


I am working on reproducing the error, but so far it appears to be random.

Ben



On 2012-10-09 18:44, Rick James wrote:

As for the crash, I don't know.  Instead, I recommend either shrinking the size 
(if you use a big QC) or turning it off.  This would make the issue go away.

ANY modification to a particular table leads to ALL entries in the Query cache 
being purged.  For that reason, we (Yahoo) almost never use the QC on any of 
our many servers.

Please provide
SHOW GLOBAL VARIABLES LIKE 'query%';
SHOW GLOBAL STATUS LIKE 'Qc%';

Because of inefficiencies in 'pruning', having a query_cache_size bigger than 
50M may actually degrade performance.

In you have a reproducible test case, submit to bugs.mysql.com .



-Original Message-
From: Ben Clewett [mailto:b...@clewett.org.uk]
Sent: Tuesday, October 09, 2012 4:47 AM
To: mysql@lists.mysql.com
Subject: Query Cache Crashing

Hi MySql,

Since upgrading to 5.5.27 (5.5.27-ndb-7.2.8-cluster-gpl-log to be
exact) I have experienced problems with the query cache.  I am
wondering if I am the only one?

Twice I have had a core-dump (show at end of mail) on two separate
servers running this version.

Now I had a complete lock, where 'SHOW PROCESSLIST' showed this for
every connection:

Waiting for query cache lock

This resulted in connections building until the limit was hit.  I could
only cure this with a 'kill -9' on the mysqld process.

Are any other users experiencing this?  Do any users know of a fix?

Thanks!

Ben Clewett.


Thread pointer: 0x7f6ea014cf90
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 410220e8 thread_stack 0x4
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x81c7f5]
/usr/sbin/mysqld(handle_fatal_signal+0x403)[0x6e9383]
/lib64/libpthread.so.0[0x7f6f491b2b30]
/usr/sbin/mysqld(_ZN11Query_cache12unlink_tableEP23Query_cache_block_ta
ble+0x1b)[0x5bd1ab]

/usr/sbin/mysqld(_ZN11Query_cache19free_query_internalEP17Query_cache_b
lock+0x7a)[0x5bdb5a]

/usr/sbin/mysqld(_ZN11Query_cache27invalidate_query_block_listEP3THDP23
Query_cache_block_table+0x7e)[0x5be8ee]

/usr/sbin/mysqld(_ZN11Query_cache12insert_tableEjPcP23Query_cache_block
_tablejhPFcP3THDS0_jPyEy+0x91)[0x5bf7e1]

/usr/sbin/mysqld(_ZN11Query_cache25register_tables_from_listEP10TABLE_L
ISTjP23Query_cache_block_table+0x176)[0x5bfa76]

/usr/sbin/mysqld(_ZN11Query_cache19register_all_tablesEP17Query_cache_b
lockP10TABLE_LISTj+0x15)[0x5bfbb5]

/usr/sbin/mysqld(_ZN11Query_cache11store_queryEP3THDP10TABLE_LIST+0x39e
)[0x5bff5e]

/usr/sbin/mysqld[0x5eb02d]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x269b)[0x5f0aeb]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x18a)[0x5f2cba
]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x13
21)[0x5f4861]

/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x167)[0x68d807]
/usr/sbin/mysqld(handle_one_connection+0x54)[0x68d874]
/lib64/libpthread.so.0[0x7f6f491ab040]
/lib64/libc.so.6(clone+0x6d)[0x7f6f4847c08d]

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: Query Cache Crashing

2012-10-09 Thread Rick James
As for the crash, I don't know.  Instead, I recommend either shrinking the size 
(if you use a big QC) or turning it off.  This would make the issue go away.

ANY modification to a particular table leads to ALL entries in the Query cache 
being purged.  For that reason, we (Yahoo) almost never use the QC on any of 
our many servers.

Please provide
SHOW GLOBAL VARIABLES LIKE 'query%';
SHOW GLOBAL STATUS LIKE 'Qc%';

Because of inefficiencies in 'pruning', having a query_cache_size bigger than 
50M may actually degrade performance.

In you have a reproducible test case, submit to bugs.mysql.com .


 -Original Message-
 From: Ben Clewett [mailto:b...@clewett.org.uk]
 Sent: Tuesday, October 09, 2012 4:47 AM
 To: mysql@lists.mysql.com
 Subject: Query Cache Crashing
 
 Hi MySql,
 
 Since upgrading to 5.5.27 (5.5.27-ndb-7.2.8-cluster-gpl-log to be
 exact) I have experienced problems with the query cache.  I am
 wondering if I am the only one?
 
 Twice I have had a core-dump (show at end of mail) on two separate
 servers running this version.
 
 Now I had a complete lock, where 'SHOW PROCESSLIST' showed this for
 every connection:
 
 Waiting for query cache lock
 
 This resulted in connections building until the limit was hit.  I could
 only cure this with a 'kill -9' on the mysqld process.
 
 Are any other users experiencing this?  Do any users know of a fix?
 
 Thanks!
 
 Ben Clewett.
 
 
 Thread pointer: 0x7f6ea014cf90
 Attempting backtrace. You can use the following information to find out
 where mysqld died. If you see no messages after this, something went
 terribly wrong...
 stack_bottom = 410220e8 thread_stack 0x4
 /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x81c7f5]
 /usr/sbin/mysqld(handle_fatal_signal+0x403)[0x6e9383]
 /lib64/libpthread.so.0[0x7f6f491b2b30]
 /usr/sbin/mysqld(_ZN11Query_cache12unlink_tableEP23Query_cache_block_ta
 ble+0x1b)[0x5bd1ab]
 
 /usr/sbin/mysqld(_ZN11Query_cache19free_query_internalEP17Query_cache_b
 lock+0x7a)[0x5bdb5a]
 
 /usr/sbin/mysqld(_ZN11Query_cache27invalidate_query_block_listEP3THDP23
 Query_cache_block_table+0x7e)[0x5be8ee]
 
 /usr/sbin/mysqld(_ZN11Query_cache12insert_tableEjPcP23Query_cache_block
 _tablejhPFcP3THDS0_jPyEy+0x91)[0x5bf7e1]
 
 /usr/sbin/mysqld(_ZN11Query_cache25register_tables_from_listEP10TABLE_L
 ISTjP23Query_cache_block_table+0x176)[0x5bfa76]
 
 /usr/sbin/mysqld(_ZN11Query_cache19register_all_tablesEP17Query_cache_b
 lockP10TABLE_LISTj+0x15)[0x5bfbb5]
 
 /usr/sbin/mysqld(_ZN11Query_cache11store_queryEP3THDP10TABLE_LIST+0x39e
 )[0x5bff5e]
 
 /usr/sbin/mysqld[0x5eb02d]
 /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x269b)[0x5f0aeb]
 /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x18a)[0x5f2cba
 ]
 /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x13
 21)[0x5f4861]
 
 /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x167)[0x68d807]
 /usr/sbin/mysqld(handle_one_connection+0x54)[0x68d874]
 /lib64/libpthread.so.0[0x7f6f491ab040]
 /lib64/libc.so.6(clone+0x6d)[0x7f6f4847c08d]
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: query help

2012-09-13 Thread Stillman, Benjamin
I think this will get you there:

SELECT LEAD, COUNT(*) FROM ORDERS GROUP BY LEAD;

It'll give you something more like:

| LEAD | COUNT(*) |
| F | 44 |
| S | 122   |
| R | 32 |



-Original Message-
From: Richard Reina [mailto:gatorre...@gmail.com]
Sent: Thursday, September 13, 2012 9:51 AM
To: mysql@lists.mysql.com
Subject: query help

I have a table like this:

|ORDERS|
|ID| DATE | QNT | LEAD |
|342  | 8-12-12 | 32   | F|
|345  | 8-15-12 | 12   | S|
|349  | 8-16-12 | 9 | R|

I am looking for a way to query it with counts by the LEAD column in order to 
tell what the number of each type lead is, so that I get something like this:

F_LEADS  S_LEADS  R_LEADS
 44 122 32

Is this possible?  If so can anyone help with syntax?

Thanks,

Richard

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql




Notice: This communication may contain privileged and/or confidential 
information. If you are not the intended recipient, please notify the sender by 
email, and immediately delete the message and any attachments without copying 
or disclosing them. LBI may, for any reason, intercept, access, use, and 
disclose any information that is communicated by or through, or which is stored 
on, its networks, applications, services, and devices.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: query help

2012-09-13 Thread Rick James
Transposing is ugly in SQL.  It is better done in some other language (PHP, 
Java, ...)

If you must do it in SQL, search for
pivot Peter Brawley

 -Original Message-
 From: Stillman, Benjamin [mailto:bstill...@limitedbrands.com]
 Sent: Thursday, September 13, 2012 7:09 AM
 To: 'Richard Reina'; mysql@lists.mysql.com
 Subject: RE: query help
 
 I think this will get you there:
 
 SELECT LEAD, COUNT(*) FROM ORDERS GROUP BY LEAD;
 
 It'll give you something more like:
 
 | LEAD | COUNT(*) |
 | F | 44 |
 | S | 122   |
 | R | 32 |
 
 
 
 -Original Message-
 From: Richard Reina [mailto:gatorre...@gmail.com]
 Sent: Thursday, September 13, 2012 9:51 AM
 To: mysql@lists.mysql.com
 Subject: query help
 
 I have a table like this:
 
 |ORDERS|
 |ID| DATE | QNT | LEAD |
 |342  | 8-12-12 | 32   | F|
 |345  | 8-15-12 | 12   | S|
 |349  | 8-16-12 | 9 | R|
 
 I am looking for a way to query it with counts by the LEAD column in
 order to tell what the number of each type lead is, so that I get
 something like this:
 
 F_LEADS  S_LEADS  R_LEADS
  44 122 32
 
 Is this possible?  If so can anyone help with syntax?
 
 Thanks,
 
 Richard
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql
 
 
 
 
 Notice: This communication may contain privileged and/or confidential
 information. If you are not the intended recipient, please notify the
 sender by email, and immediately delete the message and any attachments
 without copying or disclosing them. LBI may, for any reason, intercept,
 access, use, and disclose any information that is communicated by or
 through, or which is stored on, its networks, applications, services,
 and devices.
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: query help

2012-08-07 Thread hsv
 2012/07/31 15:02 -0700, Haluk Karamete 
So, in a case like this

shop.orders.32442
shop.orders.82000
shop.orders.34442

It would be the record whose source_recid  is shop.orders.82000. Why? Cause
82000 happens to be the largest integer.

Now, if they are always 5-digit-long integers, try ending with
ORDER BY SUBSTRING_INDEX(source_recid, '.', -1)


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: query help

2012-08-01 Thread Rick James
Might need some type coercion:

SELECTSUBSTRING(recid, 13, column size) AS numbers FROM table
 ORDER BY SUBSTRING(recid, 13, column size)+0  DESC

 -Original Message-
 From: Paul Halliday [mailto:paul.halli...@gmail.com]
 Sent: Tuesday, July 31, 2012 3:27 PM
 To: Haluk Karamete
 Cc: MySQL
 Subject: Re: query help
 
 On Tue, Jul 31, 2012 at 7:02 PM, Haluk Karamete
 halukkaram...@gmail.com wrote:
  I've got a text field called source_recid. It stores half string half
  number like strings in it.
 
  Example
 
  shop.orders.32442
 
  the syntax is DATABASENAME.TABLENAME.RECID
 
  My goal is to scan this col and find out the biggest RECID ( the
  integer) in it.
 
  So, in a case like this
 
  shop.orders.32442
  shop.orders.82000
  shop.orders.34442
 
  It would be the record whose source_recid  is shop.orders.82000. Why?
  Cause
  82000 happens to be the largest integer.
 
  What SQL statement would get me that record?
 
  One option to this is to create a new column ( the_ids ) and move all
  the integers in it and then run something like this
 
  select source_recid from mytable where source_recid like
 'shop.orders.%'
  order by the_ids DESC LIMIT 1
 
  Is there a way to pull this off without going thru this step?
 
 Would substring work?
 
 SELECT SUBSTRING(recid, 13, column size) AS numbers FROM table ORDER
 BY numbers DESC
 
 
 --
 Paul Halliday
 http://www.pintumbler.org/
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: query help

2012-08-01 Thread Vikas Shukla
Hi,

Use LIMIT 1 to limit the number of output to single record.

Regards,

Vikas Shukla


On Wed, Aug 1, 2012 at 3:56 AM, Paul Halliday paul.halli...@gmail.comwrote:

  On Tue, Jul 31, 2012 at 7:02 PM, Haluk Karamete halukkaram...@gmail.com
 wrote:
  I've got a text field called source_recid. It stores half string half
  number like strings in it.
 
  Example
 
  shop.orders.32442
 
  the syntax is DATABASENAME.TABLENAME.RECID
 
  My goal is to scan this col and find out the biggest RECID ( the
  integer) in it.
 
  So, in a case like this
 
  shop.orders.32442
  shop.orders.82000
  shop.orders.34442
 
  It would be the record whose source_recid  is shop.orders.82000. Why?
 Cause
  82000 happens to be the largest integer.
 
  What SQL statement would get me that record?
 
  One option to this is to create a new column ( the_ids ) and move all the
  integers in it and then run something like this
 
  select source_recid from mytable where source_recid like 'shop.orders.%'
  order by the_ids DESC LIMIT 1
 
  Is there a way to pull this off without going thru this step?

 Would substring work?

 SELECT SUBSTRING(recid, 13, column size) AS numbers FROM table ORDER
 BY numbers DESC


 --
 Paul Halliday
 http://www.pintumbler.org/

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql




Re: query help

2012-07-31 Thread Paul Halliday
On Tue, Jul 31, 2012 at 7:02 PM, Haluk Karamete halukkaram...@gmail.com wrote:
 I've got a text field called source_recid. It stores half string half
 number like strings in it.

 Example

 shop.orders.32442

 the syntax is DATABASENAME.TABLENAME.RECID

 My goal is to scan this col and find out the biggest RECID ( the
 integer) in it.

 So, in a case like this

 shop.orders.32442
 shop.orders.82000
 shop.orders.34442

 It would be the record whose source_recid  is shop.orders.82000. Why? Cause
 82000 happens to be the largest integer.

 What SQL statement would get me that record?

 One option to this is to create a new column ( the_ids ) and move all the
 integers in it and then run something like this

 select source_recid from mytable where source_recid like 'shop.orders.%'
 order by the_ids DESC LIMIT 1

 Is there a way to pull this off without going thru this step?

Would substring work?

SELECT SUBSTRING(recid, 13, column size) AS numbers FROM table ORDER
BY numbers DESC


-- 
Paul Halliday
http://www.pintumbler.org/

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query take too long time - please help!

2012-07-10 Thread Ananda Kumar
can u show the explain plan for your query

On Tue, Jul 10, 2012 at 2:59 PM, Darek Maciera darekmaci...@gmail.comwrote:

 Hello,

 I have table:

 mysql DESCRIBE books;

 |id  |int(255)  | NO   | PRI |
 NULL  | auto_increment |
 | idu   | int(255)  | NO   | MUL | NULL
 | ksd   | char(15)  | YES  | | NULL
 | idn   | int(1)| NO   | MUL | NULL
 | text1   | text  | NO   | | NULL
 | ips | int(1)| NO   | MUL | NULL
 | ips2| int(1)| NO   | MUL | NULL
 | date | timestamp | NO   | | CURRENT_TIMESTAMP
 | date2   | date  | NO   | | NULL
 | text2| text  | NO   | | NULL
 | text3| text  | NO   | | NULL

 I have indexes in this table:

 mysql SHOW INDEX FROM uzytkownicy_strona_kody;

 | books |  0 | PRIMARY|1 | id  | A
 |  369625 | NULL | NULL   |  | BTREE  |
 |
 | books |  1 | idx_idu|1 | idu  | A
  |  184812 | NULL | NULL   |  | BTREE  |
 |
 | books |  1 | idx_id |1 | id | A
|  369625 | NULL | NULL   |  | BTREE  |
 |
 | books |  1 | idx_idn|1 | idn  | A
  |   8 | NULL | NULL   |  | BTREE  |
 |
 | books |  1 | idx_ips  |1 | ips  | A   |
  8 | NULL | NULL   |  | BTREE  | |
 | books |  1 | idx_ips2 |1 | ips2 | A  |
 8 | NULL | NULL   |  | BTREE  | |

 Every books have unique: 'ksd'. There are about 370.000 records in this
 table.

 But this SELECT take too long time:

 mysql SELECT * FROM books WHERE ksd ='A309CC47B7';
 1 row in set (2.59 sec)


 Table is in InnoDB engine. I added to my.cnf:   innodb_buffer_pool_size =
 512MB

 Any suggestions? Help, please..

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql




Re: Query take too long time - please help!

2012-07-10 Thread Ananda Kumar
you are using a function-LOWER, which will not make use of the unique key
index on ksd.
Mysql does not support function based index, hence your query is doing a
FULL TABLE scan and taking more time.

On Tue, Jul 10, 2012 at 4:46 PM, Darek Maciera darekmaci...@gmail.comwrote:

 2012/7/10 Ananda Kumar anan...@gmail.com:
  can u show the explain plan for your query
 

 Thanks, for reply!

 Sure:

 mysql EXPLAIN SELECT * FROM books WHERE LOWER(ksd)=LOWER('4204661375');

 ++-+-+--+---+--+-+--++-+
 | id | select_type | table   | type | possible_keys |
 key  | key_len | ref  | rows   | Extra   |

 ++-+-+--+---+--+-+--++-+
 |  1 | SIMPLE  | books  | ALL  | NULL  |
 NULL | NULL| NULL | 378241 | Using where |

 ++-+-+--+---+--+-+--++-+
 1 row in set (0.00 sec)

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql




Re: Query help...

2012-05-28 Thread hsv
 2012/05/28 08:03 -0700, Don Wieland 
Any assistance would be appreciated. Thanks!


Maybe something like this:


SELECT usie.client_id, first_name, last_name, COUNT(anie.client_id)

FROM
(SELECT client_id, first_name, last_name, time_start
FROM tl_appt
JOIN tl_rooms USING(room_id)
JOIN tl_clients USING(client_id)
JOIN tl_users on USING(user_id)

WHERE
appt_id IS NOT NULL AND
time_start between '1293858000' and '1325393999' AND
location_id = '1' and
appt_status_id IN (3) and
user_id IN (506)

GROUP BY user_id, client_id) AS usie

LEFT JOIN

(SELECT client_id, time_start
FROM tl_appt
JOIN tl_rooms USING(room_id)
JOIN tl_clients USING(client_id)

WHERE
appt_id IS NOT NULL AND
location_id = '1' and
appt_status_id IN (3)) AS anie

ON usie.client_id = anie.client_id AND usie.time_start  anie.time_start
GROUP BY usie.client_id

**
Scanned by  MailScan AntiVirus and Content Security Software.
Visit http://www.escanav.com for more info on eScan and X-Spam.
**



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query weirdness...

2012-05-26 Thread Baron Schwartz
Don,

It is a little hard for me to comment on the query because I don't
have the context I need. When I tackle problems like this I ignore the
query and ask what are you trying to DO? The queries you sent look
like tunnel vision to me; there is probably a better way to do what
you're trying to do, which may not resemble the queries you posted at
all.

That said, I see a number of issues with the queries you posted.
Although they are legal SQL, they are buggy. You can write bugs in SQL
just like any language.  For example, you are using LEFT JOIN
apparently without understanding it fully. Take a look at this
excerpt:

 FROM tl_appt apt
        LEFT JOIN tl_rooms r on r.room_id = apt.room_id
 WHERE
        r.location_id = '1'

You are converting the LEFT JOIN (which is a synonym for LEFT OUTER
JOIN) into an INNER JOIN with the constraint in the WHERE clause. LEFT
JOIN is designed to include rows in the left-hand table that have no
matching rows in the right-hand table (r, in this case), and will fill
the missing cells in the resultset with NULL. But r.location_id will
filter out such results. In general, use LEFT JOIN only when you need
it. Otherwise just use JOIN (which is a synonym for INNER JOIN).

Other problems I see:

- You are joining to the same tables in multiple places. It looks to
me like your query needs refactoring, at the least. I'm not using
refactoring in the Java sense here, but in the sense of algebra. For
example, you know that AB+AC is the same as A(B+C). The redundant
mentions of some of those tables seem to need a similar refactoring to
me: pull out the common terms (tables) and access them only once.

- Your nested subqueries seem to be an overly complex, and possibly
wrong, way to approach the problem. That's just my gut feeling based
on code smell. In general, a subquery in the FROM clause (in your
case, aliased as q1) is only needed when GROUP BY is required, and
using such a subquery for the sole purpose of wrapping a WHERE
q1.previous = 0 around it smells like something needs to be
unwrapped. The WHERE clause could be pushed into the subquery, and the
subquery thus removed.

- Your q1 subquery has non-deterministic behavior because you're
selecting non-grouped columns. You're selecting last and first names,
for example, but those are neither constants nor grouped-by, and thus
are not constant per-group. You are going to get a pseudo-random and
nondeterministic value for each group. This alone could account for
the problems you're seeing. To avoid this problem you can try running
your query with ONLY_FULL_GROUP_BY in the SQL_MODE variable. That
setting will throw an error rather than silently running the query and
returning random values from the group.

- I usually find that subqueries embedded inline into the column
list (in your case, the subqueries for previous and
dr_all_ther_qty) are better replaced by something else. Not for
performance reasons -- but because the set-based thinking becomes a
mixture of set-based and FOREACH-like. You know, for each row I find,
execute this subquery and generate a number of previous XYZ... My
experience has been that this quickly confuses query authors and makes
them write something that's not what they intend. I can't look at the
query and say it's not what you intend, because I don't know your
intention, but again I'm reacting to code smell and gut feeling.

Those are just some observations that may be helpful.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query assistance...

2012-05-21 Thread Peter Brawley

On 2012-05-21 11:17 AM, Don Wieland wrote:
I have got this query that returns the correct rows I want to display, 
BUT the columns dr_ther_qty, dr_not_ther_qty, and dr_all_ther_qty are 
not calculating correctly:


--- START QUERY ONE ---

SELECT q1.* FROM

(SELECT  apt.user_id, apt.client_id, c.last_name, c.first_name, 
MIN(apt.time_start) AS stime, 
FROM_UNIXTIME(MIN(apt.time_start),'%Y-%m-%d') AS ftime,


(SELECT count(*) FROM tl_appt WHERE client_id = apt.client_id AND 
appt_status_id = '3' AND time_start  apt.time_start) AS previous,


(SELECT count(*) FROM tl_appt WHERE client_id = apt.client_id AND 
user_id = apt.user_id AND appt_status_id = '3' AND 
FROM_UNIXTIME(apt.time_start,'%Y-%m-%d') between '2012-01-01' and 
'2012-05-20') AS dr_ther_qty,


(SELECT count(*) FROM tl_appt WHERE client_id = apt.client_id AND 
user_id != apt.user_id AND appt_status_id = '3' AND 
FROM_UNIXTIME(apt.time_start,'%Y-%m-%d') between '2012-01-01' and 
'2012-05-20') AS dr_not_ther_qty,


(SELECT DISTINCT count(*) FROM tl_appt WHERE client_id = apt.client_id 
AND appt_status_id = '3' AND FROM_UNIXTIME(apt.time_start,'%Y-%m-%d') 
between '2012-01-01' and '2012-05-20') AS dr_all_ther_qty


FROM tl_appt apt

LEFT JOIN tl_rooms r on r.room_id = apt.room_id
LEFT JOIN tl_clients c on c.client_id = apt.client_id
LEFT JOIN tl_users u on u.user_id = apt.user_id

WHERE apt.appt_id IS NOT NULL AND 
FROM_UNIXTIME(apt.time_start,'%Y-%m-%d') between '2012-01-01' and 
'2012-05-20' AND apt.appt_status_id = '3' and r.location_id = '2' and 
apt.user_id IN (14, 503)

GROUP BY apt.user_id, apt.client_id
ORDER BY u.last_name, u.first_name, c.last_name, c.first_name) as q1

WHERE q1.previous  0;

--- END QUERY ONE ---

The totals of dr_ther_qty, dr_not_ther_qty, and dr_all_ther_qty are 
not the same if I broke them out into separate queries:


I think it is something to do with the GROUP BY - it is multiplying rows. 


Yes: select a.id,count(*) from a join b using(...) join c using(...) 
will multiply counts. The most effective solution is to move each 
aggregation inside its own FROM clause subquery.


PB

-


Basically, the rows are correct and I want to use the user_id and 
client_id to calculate the SUB-SELECTS.


Can someone explain why when I run in the MAIN query I get this:

dr_ther_qty = 25
dr_not_ther_qty = 22
dr_all_ther_qty = 47

BUT when I break out that client into the separate queries, I get

SELECT count(*) as dr_ther_qty FROM tl_appt WHERE client_id = 161 AND 
user_id = 503 AND appt_status_id = '3' AND 
FROM_UNIXTIME(time_start,'%Y-%m-%d') between '2012-01-01' and 
'2012-05-20';


dr_ther_qty = 6

SELECT count(*) as dr_not_ther_qty FROM tl_appt WHERE client_id = 161 
AND user_id != 503 AND appt_status_id = '3' AND 
FROM_UNIXTIME(time_start,'%Y-%m-%d') between '2012-01-01' and 
'2012-05-20';


dr_not_ther_qty = 2

SELECT count(*) as dr_all_ther_qty FROM tl_appt WHERE client_id = 161 
AND appt_status_id = '3' AND FROM_UNIXTIME(time_start,'%Y-%m-%d') 
between '2012-01-01' and '2012-05-20';


dr_all_ther_qty = 8


I appreciate any enlightenment on this. Thanks!

Don




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: Query help,,,

2012-05-17 Thread Rick James
For (1), ponder:

Group-by trick example: Find the most populous city in each state:

SELECT  state, city, population, COUNT(*) AS num_cities
FROM
  ( SELECT  state, city, population
FROM  us
ORDER BY  state, population DESC ) p
GROUP BY  state
ORDER BY  state;
+---+-+++
| state | city| population | num_cities |
+---+-+++
| AK| Anchorage   | 276263 | 16 |
| AL| Birmingham  | 231621 | 58 |
| AR| Little Rock | 184217 | 40 |
| AZ| Phoenix |1428509 | 51 |
| CA| Los Angeles |3877129 |447 |
...

 -Original Message-
 From: Don Wieland [mailto:d...@pointmade.net]
 Sent: Thursday, May 17, 2012 7:37 AM
 To: mysql@lists.mysql.com
 Subject: Query help,,,
 
 Hi folks,
 
 I am trying to compile a query that does statistics on appointments
 based on specific criteria. Here is my starting query:
 
 SELECT
  u.user_id,
  c.client_id,
  c.first_name,
  c.last_name,
  a.time_start AS stime,
  FROM_UNIXTIME(a.time_start,'%Y-%m-%d') AS formatted
 
   FROM tl_appt a
LEFT JOIN tl_users u ON a.user_id = u.user_id
LEFT JOIN tl_clients c ON a.client_id = c.client_id
LEFT JOIN tl_rooms r on a.room_id = r.room_id
 
WHERE a.appt_id IS NOT NULL AND FROM_UNIXTIME(a.time_start,'%Y-%m-
 %d') between '2011-05-01' and '2011-12-31' and r.location_id = '2' and
 a.user_id IN (14) ORDER BY u.last_name, u.first_name, c.last_name,
 c.first_name
 
 This will return a set of rows where a client may have MORE THEN ONE
 appointment. From this set I need to narrow more:
 
 1) Only display the first appointment PER Client. (there will be no
 duplicate client_id)
 
 Then once I have that set of rows established, I need to query for two
 more result:
 
 1) Show New Customers = those where the COUNT of appointments (in the
 entire tl_appt table) LESS THAN the stime = 0
 
 2) Show FORMER Customers = those where the COUNT of appointments (in
 the entire tl_appt table) LESS THAN the stime  0
 
 I am sure I need a multiple select query, but I am having a hard time
 wrapping my head around it.
 
 Thanks for any feedback.
 
 Don
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query help,,,

2012-05-17 Thread Peter Brawley

On 2012-05-17 9:37 AM, Don Wieland wrote:

Hi folks,

I am trying to compile a query that does statistics on appointments 
based on specific criteria. Here is my starting query:


SELECT
u.user_id,
c.client_id,
c.first_name,
c.last_name,
a.time_start AS stime,
FROM_UNIXTIME(a.time_start,'%Y-%m-%d') AS formatted

 FROM tl_appt a
  LEFT JOIN tl_users u ON a.user_id = u.user_id
  LEFT JOIN tl_clients c ON a.client_id = c.client_id
  LEFT JOIN tl_rooms r on a.room_id = r.room_id

  WHERE a.appt_id IS NOT NULL AND 
FROM_UNIXTIME(a.time_start,'%Y-%m-%d') between '2011-05-01' and 
'2011-12-31' and r.location_id = '2' and a.user_id IN (14) ORDER BY 
u.last_name, u.first_name, c.last_name, c.first_name


This will return a set of rows where a client may have MORE THEN ONE 
appointment. From this set I need to narrow more:


1) Only display the first appointment PER Client. (there will be no 
duplicate client_id)


Then once I have that set of rows established, I need to query for two 
more result:


1) Show New Customers = those where the COUNT of appointments (in the 
entire tl_appt table) LESS THAN the stime = 0


2) Show FORMER Customers = those where the COUNT of appointments (in 
the entire tl_appt table) LESS THAN the stime  0


I am sure I need a multiple select query, but I am having a hard time 
wrapping my head around it.


Thanks for any feedback.


Conceptually the hard bit might be narrowing to the first appt per 
client. For various approaches to this task see Within-group 
aggregates at http://www.artfulsoftware.com/infotree/queries.php.


If new  former clients are to be retrieved from first appts only, you 
might want to save the result of the within-groups agggregate query to a 
result table and query that. If the whole process has to happen often, 
consider developing a wee cube, or just a denormalised reslt table that 
can be invoked whenever needed.


PB

-



Don



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: query problem with null

2012-03-09 Thread David Lerer
Have you tried to set city = null   (i.e. without the quotes)? David.



-Original Message-
From: Richard Reina [mailto:gatorre...@gmail.com] 
Sent: Friday, March 09, 2012 4:24 PM
To: mysql@lists.mysql.com
Subject: query problem with null

 When I do the following query:

SELECT * FROM geo_trivia WHERE city IS NULL;

certain columns that DO have 'NULL' value for city and not a '' (blank)
value do not show up.
I have even gone to the extent of reseting these records value as ='NULL'
with UPDATE and they are still are not selected when I run the above
query.  Can anyone help?

The information contained in this e-mail and any attached
documents may be privileged, confidential and protected from
disclosure.  If you are not the intended recipient you may not
read, copy, distribute or use this information.  If you have
received this communication in error, please notify the sender
immediately by replying to this message and then delete it
from your system.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: query problem with null

2012-03-09 Thread Richard Reina
Ahhh... Thank you, that was exactly what the problem was. I will fix the
code that is setting the value of these new records to 'NULL'.

Thank you.

2012/3/9 David Lerer dle...@us.univision.com

 Have you tried to set city = null   (i.e. without the quotes)? David.



 -Original Message-
 From: Richard Reina [mailto:gatorre...@gmail.com]
 Sent: Friday, March 09, 2012 4:24 PM
 To: mysql@lists.mysql.com
 Subject: query problem with null

  When I do the following query:

 SELECT * FROM geo_trivia WHERE city IS NULL;

 certain columns that DO have 'NULL' value for city and not a '' (blank)
 value do not show up.
 I have even gone to the extent of reseting these records value as ='NULL'
 with UPDATE and they are still are not selected when I run the above
 query.  Can anyone help?

 The information contained in this e-mail and any attached
 documents may be privileged, confidential and protected from
 disclosure.  If you are not the intended recipient you may not
 read, copy, distribute or use this information.  If you have
 received this communication in error, please notify the sender
 immediately by replying to this message and then delete it
 from your system.



Re: query problem with null

2012-03-09 Thread Johan De Meersman
- Original Message -
 From: David Lerer dle...@us.univision.com
 
 Have you tried to set city = null   (i.e. without the quotes)?

Spot on, I'd think.

NULL values are not a string with NULL in it - that's only what it looks like 
in query results :-) An empty string ('') is to strings what 0 (zero) is for 
integers: it says the value of this field is nothing. NULL, on the other 
hand, means the value of this field is a total unknown, which is useful, for 
example, in a field 'quantity': zero is still a valid, meaningful quantity; 
whereas you would use NULL to indicate that you simply do not know the quantity.

It's a bit of a peculiar concept, but as David indicated, IS NULL will not 
match fields set to the string NULL - as that is a string, not an unknown.

Another funny attribute of NULL is that NULL != NULL. There simply *is* nothing 
to compare, so you cannot ever say it's equal.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query help...

2012-02-29 Thread Singer X.J. Wang
http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_ifnull

On Wed, Feb 29, 2012 at 13:15, Don Wieland d...@pointmade.net wrote:

 Little help...

 In my mySQL query editor, I am trying to return a value of 0 when there is
 no related rows from this query:

 (select if(count(ip.payment_amount) IS NOT NULL, count(ip.payment_amount)
 , 0)  FROM tl_trans_pmt_items ip WHERE t.transaction_id = ip.inv_id GROUP
 BY ip.inv_id) as d,

 regardless of the combination I use, invalid relationships come back as
 NULL - need to return 0  so I can use it in a math formula.

 Probably simple - maybe ;-)

 Don

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql



-- 
--
Pythian proud winner of Oracle North America Titan Award for Exadata 
Solution...watch the video on pythian.com


Re: Query help...

2012-02-29 Thread Michael Heaney

On 2/29/2012 1:15 PM, Don Wieland wrote:

Little help...

In my mySQL query editor, I am trying to return a value of 0 when
there is no related rows from this query:

(select if(count(ip.payment_amount) IS NOT NULL,
count(ip.payment_amount) , 0)  FROM tl_trans_pmt_items ip WHERE
t.transaction_id = ip.inv_id GROUP BY ip.inv_id) as d,

regardless of the combination I use, invalid relationships come back
as NULL - need to return 0  so I can use it in a math formula.

Probably simple - maybe ;-)

Don




I think you want the 'coalesce' function:

Syntax:
COALESCE(value,...)

Returns the first non-NULL value in the list, or NULL if there are no
non-NULL values.

URL: http://dev.mysql.com/doc/refman/5.1/en/comparison-operators.html

Examples:
mysql SELECT COALESCE(NULL,1);
- 1


Michael Heaney
JCVI


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query query

2011-12-06 Thread Jan Steinman
 From: Arthur Fuller fuller.art...@gmail.com
 
 You should still lose the pointless WHERE 1.

Besides being pointless, is there any harm in having a superfluous WHERE 1 
beyond the few microseconds it takes to parse it and throw it away?

Just curious, because phpMyAdmin includes it in its query template, and I often 
just leave it there when making views or bookmarking queries.


No one can hurt you without your consent. -- Eleanor Roosevelt
 Jan Steinman, EcoReality Co-op 





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query query

2011-12-04 Thread Arthur Fuller
You should still lose the pointless WHERE 1.
Arthur

On Sun, Dec 4, 2011 at 1:38 AM, Jan Steinman j...@bytesmiths.com wrote:

 DOH! Brain unfroze, and I realized I needed an aggregate:

 SELECT
COUNT(lib.Dewey) AS Have,
ddn.Dewey AS DDN,
ddn.Classification AS Classification
 FROM s_library_dewey ddn
 LEFT OUTER JOIN s_library lib ON ddn.Dewey = FLOOR(lib.Dewey)
 WHERE 1
 GROUP BY ddn.Dewey




Re: Query query

2011-12-04 Thread Hal�sz S�ndor
Well, of that which you showed you _need_ only this, not really different from 
that which you wrote:

SELECT
COUNT(lib.Dewey) AS Have,
ddn.Dewey AS DDN,
ddn.Classification
FROM s_library_dewey ddn
LEFT OUTER JOIN s_library lib ON ddn.Dewey = FLOOR(lib.Dewey)
GROUP BY ddn.Dewey

As for FLOOR in an ON-clause, surely the general-builtin-function overhead 
completely overwhelms the operation s cost. Maybe index on Dewey would help.

(Which Dewey? with computer under math, c, or with computer under 000? Where 
can one get a 1000-element list for computer?)


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query query

2011-12-03 Thread Jan Steinman
Second attempt, using a join, returns just one row for Dewey 000 with the 
COUNT being about half the volumes in the library, which isn't right...

I thought a LEFT OUTER JOIN would have returned a record for every record in 
s_library_dewey, but it only returns the first.

Brain freeze again...

SELECT
COUNT(lib.Dewey) AS Have,
ddn.Dewey AS DDN,
ddn.Classification AS Classification
FROM s_library_dewey ddn
LEFT OUTER JOIN s_library lib ON ddn.Dewey = FLOOR(lib.Dewey)
WHERE 1


 I'm having brain freeze, and wonder if anyone can help me with a query.
 
 I have a library in MySQL. There's a table with a record per book, and other 
 tables that it indexes into for meaningful info. One of those is an 
 integer-keyed list of 1,000 Dewey Decimal Codes. In the books table, the 
 Dewey field is decimal. In the Dewey table, it's an integer.
 
 I would like to make a report with the info for each DDC, including whether 
 or not there are any books for any given code's integer part. In other words, 
 I want to bucketize 101.000 to 101.999, etc, for each integer Dewey number, 
 and give some info if the count in that range is non-zero.
 
 I suspect I need a subquery to do this, but my brain is frozen! (Or should I 
 use a join? Can you even join on an inequality?)
 
 The following crashes phpMyAdmin when I try to do it. I suspect it's because 
 the subquery reference to ddn.Dewey is out of context. The subquery works 
 on its own when ddn.Dewey is a literal integer.
 
 SELECT
(SELECT COUNT(*)  0 FROM s_library lib WHERE FLOOR(lib.Dewey) = 
 ddn.Dewey) AS Have,
ddn.Dewey AS DDN,
ddn.Classification AS Classification
 FROM s_library_dewey
 ddn WHERE 1
 
 Any thoughts on the best way to do this?
 
 Thanks!
 

After providing the wealth on which the city is built, the countryside and its 
people are increasingly seen as dispensable. When it appears that cities can 
thrive on their global connections, rural hinterlands die. -- David Holmgren
 Jan Steinman, EcoReality Co-op 





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query query

2011-12-03 Thread Peter Brawley

On 12/3/2011 9:35 PM, Jan Steinman wrote:

Second attempt, using a join, returns just one row for Dewey 000 with the 
COUNT being about half the volumes in the library, which isn't right...

I thought a LEFT OUTER JOIN would have returned a record for every record in 
s_library_dewey, but it only returns the first.

Brain freeze again...

SELECT
 COUNT(lib.Dewey) AS Have,
 ddn.Dewey AS DDN,
 ddn.Classification AS Classification
FROM s_library_dewey ddn
LEFT OUTER JOIN s_library lib ON ddn.Dewey = FLOOR(lib.Dewey)
WHERE 1

Lose WHERE 1, it's useless. Add GROUP BY ddn.Dewey to get all counts.

PB

-





I'm having brain freeze, and wonder if anyone can help me with a query.

I have a library in MySQL. There's a table with a record per book, and other 
tables that it indexes into for meaningful info. One of those is an 
integer-keyed list of 1,000 Dewey Decimal Codes. In the books table, the Dewey 
field is decimal. In the Dewey table, it's an integer.

I would like to make a report with the info for each DDC, including whether or not there 
are any books for any given code's integer part. In other words, I want to 
bucketize 101.000 to 101.999, etc, for each integer Dewey number, and give 
some info if the count in that range is non-zero.

I suspect I need a subquery to do this, but my brain is frozen! (Or should I 
use a join? Can you even join on an inequality?)

The following crashes phpMyAdmin when I try to do it. I suspect it's because the subquery reference 
to ddn.Dewey is out of context. The subquery works on its own when 
ddn.Dewey is a literal integer.

SELECT
(SELECT COUNT(*)  0 FROM s_library lib WHERE FLOOR(lib.Dewey) = ddn.Dewey) 
AS Have,
ddn.Dewey AS DDN,
ddn.Classification AS Classification
FROM s_library_dewey
ddn WHERE 1

Any thoughts on the best way to do this?

Thanks!



After providing the wealth on which the city is built, the countryside and its 
people are increasingly seen as dispensable. When it appears that cities can 
thrive on their global connections, rural hinterlands die. -- David Holmgren
 Jan Steinman, EcoReality Co-op 







--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query query

2011-12-03 Thread Jan Steinman
DOH! Brain unfroze, and I realized I needed an aggregate:

SELECT
COUNT(lib.Dewey) AS Have,
ddn.Dewey AS DDN,
ddn.Classification AS Classification
FROM s_library_dewey ddn
LEFT OUTER JOIN s_library lib ON ddn.Dewey = FLOOR(lib.Dewey)
WHERE 1
GROUP BY ddn.Dewey

... although if there are any brighter ideas, I'm all ears. This doesn't seem 
very optimal to me, with the FLOOR() function in the JOIN and all... takes over 
half a second...

 Second attempt, using a join, returns just one row for Dewey 000 with the 
 COUNT being about half the volumes in the library, which isn't right...
 
 I thought a LEFT OUTER JOIN would have returned a record for every record in 
 s_library_dewey, but it only returns the first.
 
 Brain freeze again...
 
 SELECT
COUNT(lib.Dewey) AS Have,
ddn.Dewey AS DDN,
ddn.Classification AS Classification
 FROM s_library_dewey ddn
 LEFT OUTER JOIN s_library lib ON ddn.Dewey = FLOOR(lib.Dewey)
 WHERE 1
 
 
 I'm having brain freeze, and wonder if anyone can help me with a query.
 
 I have a library in MySQL. There's a table with a record per book, and other 
 tables that it indexes into for meaningful info. One of those is an 
 integer-keyed list of 1,000 Dewey Decimal Codes. In the books table, the 
 Dewey field is decimal. In the Dewey table, it's an integer.
 
 I would like to make a report with the info for each DDC, including whether 
 or not there are any books for any given code's integer part. In other 
 words, I want to bucketize 101.000 to 101.999, etc, for each integer Dewey 
 number, and give some info if the count in that range is non-zero.
 
 I suspect I need a subquery to do this, but my brain is frozen! (Or should I 
 use a join? Can you even join on an inequality?)
 
 The following crashes phpMyAdmin when I try to do it. I suspect it's because 
 the subquery reference to ddn.Dewey is out of context. The subquery works 
 on its own when ddn.Dewey is a literal integer.
 
 SELECT
   (SELECT COUNT(*)  0 FROM s_library lib WHERE FLOOR(lib.Dewey) = 
 ddn.Dewey) AS Have,
   ddn.Dewey AS DDN,
   ddn.Classification AS Classification
 FROM s_library_dewey
 ddn WHERE 1
 
 Any thoughts on the best way to do this?
 
 Thanks!
 

After providing the wealth on which the city is built, the countryside and its 
people are increasingly seen as dispensable. When it appears that cities can 
thrive on their global connections, rural hinterlands die. -- David Holmgren
 Jan Steinman, EcoReality Co-op 





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: query efficiency

2011-09-27 Thread Hal�sz S�ndor
 2011/09/27 05:32 -0700, supr_star 
Here's a simple query: select name,status,min(dt), max(dt),count(*) from 
mytable group by name I need an efficient way to get this data, along with the 
status of the row with the max(dt).  'status' is not in the group by, so I get 
apparently random statuses.  Is there any way to do this without a table join?  
Or I guess the real question is:  What is the most efficient way of 
accomplishing this?

I cannot answer the real question, but there is this:

select name,(SELECT status
FROM mytable AS x
WHERE x.name = mytable.name AND x.dt = max(mytable.dt)) AS status,
min(dt), max(dt),count(*) from mytable group by name

I will not vouch that this is more efficient than joining.

(Surely there are enough of this that this is entitled to a special SQL 
construct.)


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: query optimization

2011-09-22 Thread Ananda Kumar
Your outer query select cpe_mac,max(r3_dt) from rad_r3cap, is doing a full
table scan, you might want to check on this and use a WHERE condition to
use indexed column

On Fri, Sep 23, 2011 at 12:14 AM, supr_star suprstar1...@yahoo.com wrote:



  I have a table with 24 million rows, I need to figure out how to optimize
 a query.  It has to do with mac addresses and radius packets - I want to see
 the # of connections and the min/max date. So I basically want all this
 data:

   select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num)
 recn
   from radiuscap
   where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY)
 and r3_type='Access'
   group by cpe_mac order by cpe_mac
 ;

 This piece of the query takes 30 seconds to run and produces 3500 rows.  I
 have r3_dt indexed.  I also want a status field of the row with the highest
 r3_dt:

 select rec_num,cpe_mac,req_status
 from rad_r3cap
 where r3_type='Access'
   and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap)
 ;

 This piece of the query takes forever,  I let it run for an hour and it
 still didn't finish, it's obviously not using indexes.  I have no idea how
 far along it got.  I wrote a php script to run the 1st query, then do 3500
 individual lookups for the status using the max(rec_num) field in the 1st
 query, and I can get the data in 31 seconds.  So I CAN produce this data,
 but very slowly, and not in 1 sql query.  I want to consolidate this into 1
 sql so I can make a view.

 If anyone can point me in the right direction, I'd appreciate it!



 mysql desc rad_r3cap;
 +-+-+--+-+-++
 | Field   | Type| Null | Key | Default | Extra  |
 +-+-+--+-+-++
 | rec_num | int(11) | NO   | PRI | NULL| auto_increment |
 | r3_dt   | datetime| YES  | MUL | NULL||
 | r3_micros   | int(11) | YES  | | NULL||
 | r3_type | varchar(16) | YES  | | NULL||
 | req_status  | varchar(16) | YES  | | NULL||
 | req_comment | varchar(64) | YES  | | NULL||
 | asn_ip  | varchar(16) | YES  | MUL | NULL||
 | asn_name| varchar(16) | YES  | | NULL||
 | bsid| varchar(12) | YES  | MUL | NULL||
 | cpe_ip  | varchar(16) | YES  | | NULL||
 | cpe_mac | varchar(12) | YES  | MUL | NULL||
 | filename| varchar(32) | YES  | | NULL||
 | linenum | int(11) | YES  | | NULL||
 | r3_hour | datetime| YES  | MUL | NULL||
 | user_name   | varchar(64) | YES  | | NULL||
 +-+-+--+-+-++

 mysql show indexes in rad_r3cap;

 +---++--+--+-+---+-+--++--++-+
 | Table | Non_unique | Key_name | Seq_in_index | Column_name |
 Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |

 +---++--+--+-+---+-+--++--++-+
 | rad_r3cap |  0 | PRIMARY  |1 | rec_num | A
   |23877677 | NULL | NULL   |  | BTREE  | |
 | rad_r3cap |  0 | r3cap_dt |1 | r3_dt   | A
   |NULL | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  0 | r3cap_dt |2 | r3_micros   | A
   |NULL | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_bsid   |1 | bsid| A
   | 346 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_asnip  |1 | asn_ip  | A
   |  55 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_cpemac |1 | cpe_mac | A
   |4758 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_date   |1 | r3_hour | A
   |1548 | NULL | NULL   | YES  | BTREE  | |

 +---++--+--+-+---+-+--++--++-+
 7 rows in set (0.00 sec)



Re: Query Optimization

2011-09-08 Thread Brandon Phelps

On 09/01/2011 01:32 PM, Brandon Phelps wrote:

On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  ...

  WHERE
  (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
  AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

 In that case your logic here simplifies to:
 WHERE
 open_dt = '2011-08-30 00:00:00'
 AND
 close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query window 
with two markers (s) and (e). Events will be marked by || markers showing 
their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and (e) 
is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part of 
your results.
c) the event starts before the window but ends within the window - include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include this
f) the event starts inside the window but ends beyond the window - include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need for a 
WHERE clause

WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the exact 
same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is 
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---++---+--+-++--+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
++-+---++---+--+-++--+-+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 
10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
++-+---++---+--+-++--+-+

EXPLAIN output for new method with new index:
++-+---++---+--+-++--+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
++-+---++---+--+-++--+-+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 
10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |

Re: Query Optimization

2011-09-08 Thread Andrew Moore
Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps bphe...@gls.com wrote:

 On 09/01/2011 01:32 PM, Brandon Phelps wrote:

 On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

 On 9/1/2011 09:42, Brandon Phelps wrote:

 On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
   ...

   WHERE
   (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
 00:00:00')
   AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
 12:36:53')

  In that case your logic here simplifies to:
  WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

  Now add an index over open_dt and close_dt and see what happens.

  Jochem

 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull 1
 days worth of connections, I would miss that entry. Basically I want to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start and
 end dates, I need to see that record.

 Any other ideas?


 I believe Jochem was on the right track but he got his dates reversed.

 Let's try a little ASCII art to show the situation. I will setup a query
 window with two markers (s) and (e). Events will be marked by || markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window - include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.

 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause

 WHERE start = (ending time) and end = (starting time)

 Try that and let us know the results.


 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
 AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 EXPLAIN output for old method:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
 NULL | 10 | Using where |
 | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
 1 | |
 | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
 1 | |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+

 EXPLAIN output for new method with new index:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**
 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
date ranges and new data is constantly coming in, so I am not sure how I could 
archive/cache the necessary data that would be any more efficient than simply 
using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:


On 09/01/2011 01:32 PM, Brandon Phelps wrote:


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


On 9/1/2011 09:42, Brandon Phelps wrote:


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...



WHERE
(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

00:00:00')

AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

12:36:53')


In that case your logic here simplifies to:
WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



Now add an index over open_dt and close_dt and see what happens.



Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by || markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start= (ending time) and end= (starting time)

Try that and let us know the results.



Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---+--**--+---**
+--+-+**+--+--**---+
| id | select_type | table | type | possible_keys | key | key_len | ref |
rows | Extra |
++-+---+--**--+---**
+--+-+**+--+--**---+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
NULL | 10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
1 | |
++-+---+--**--+---**
+--+-+**+--+--**---+

EXPLAIN output for new method with new index:
++-+---+--**--+---**

Re: Query Optimization

2011-09-08 Thread Andrew Moore
I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps bphe...@gls.com wrote:

 Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.



 On 09/08/2011 02:16 PM, Andrew Moore wrote:

 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?

 Andy

 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:

  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

  On 9/1/2011 09:42, Brandon Phelps wrote:

  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

 ...


  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

 00:00:00')

 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

 12:36:53')

  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'


  Now add an index over open_dt and close_dt and see what happens.


  Jochem


 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull
 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start
 and
 end dates, I need to see that record.

 Any other ideas?


  I believe Jochem was on the right track but he got his dates
 reversed.

 Let's try a little ASCII art to show the situation. I will setup a
 query
 window with two markers (s) and (e). Events will be marked by ||
 markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time
 and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be
 part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window -
 include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.

 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause

 WHERE start= (ending time) and end= (starting time)

 Try that and let us know the results.


 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for
 you)

 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
 AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 EXPLAIN output for old method:
 ++-+---+----+-**--**
 +--+-++--+**
 --**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref
 |
 rows | Extra |
 ++-+---+----+-**--**
 +--+-++--+**
 --**---+
 | 1 | SIMPLE | sc | index | 

Re: Query Optimization

2011-09-08 Thread Mihail Manolov
If you're running version 5.1+ you may wanna take a look at table partitioning 
options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

 Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
 date ranges and new data is constantly coming in, so I am not sure how I 
 could archive/cache the necessary data that would be any more efficient than 
 simply using the database directly.
 
 
 On 09/08/2011 02:16 PM, Andrew Moore wrote:
 Thinking outside the query, is there any archiving that could happen to make
 your large tables kinder in the range scan?
 
 Andy
 
 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:
 
 On 09/01/2011 01:32 PM, Brandon Phelps wrote:
 
 On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
 
 On 9/1/2011 09:42, Brandon Phelps wrote:
 
 On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
 ...
 
 WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30
 00:00:00')
 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30
 12:36:53')
 
 In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'
 
 Now add an index over open_dt and close_dt and see what happens.
 
 Jochem
 
 Jochem,
 
 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull 1
 days worth of connections, I would miss that entry. Basically I want to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start and
 end dates, I need to see that record.
 
 Any other ideas?
 
 
 I believe Jochem was on the right track but he got his dates reversed.
 
 Let's try a little ASCII art to show the situation. I will setup a query
 window with two markers (s) and (e). Events will be marked by || 
 markers
 showing their durations.
 
 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|
 
 To describe these situations:
 a) is the window for which you want to query (s) is the starting time and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window - include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.
 
 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause
 
 WHERE start= (ending time) and end= (starting time)
 
 Try that and let us know the results.
 
 
 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
 
 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
 AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;
 
 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;
 
 EXPLAIN output for old method:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
 NULL | 10 | Using where |
 | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
 1 | |
 | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
 1 | |
 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

++-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key 
| key_len | ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8   | NULL   | 32393316 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

When I remove the LIMIT 10 I get:

+-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key | key_len | 
ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL| 
NULL   | 32393330 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com  wrote:


Thanks for the reply Andy.  Unfortunately the users will be selecting
varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.



On 09/08/2011 02:16 PM, Andrew Moore wrote:


Thinking outside the query, is there any archiving that could happen to
make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:


  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


  On 9/1/2011 09:42, Brandon Phelps wrote:


  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:



...





  WHERE

(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30


00:00:00')



AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30



12:36:53')


  In that case your logic here simplifies to:

WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



  Now add an index over open_dt and close_dt and see what happens.




  Jochem




Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull
1
days worth of connections, I would miss that entry. Basically I want
to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start
and
end dates, I need to see that record.

Any other ideas?


  I believe Jochem was on the right track but he got his dates

reversed.

Let's try a little ASCII art to show the situation. I will setup a
query
window with two markers (s) and (e). Events will be marked by ||
markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting 

Re: Query Optimization

2011-09-08 Thread Derek Downey
Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
time)=close_dt=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:

 Andy,
 
 The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
 server 11.04.  Unfortunately the machine only has 2GB of RAM but no other 
 major daemons are running on the machine.  We are running RAID 1 (mirroring) 
 with 1TB drives.  The tables in question here are all MyISAM.  When running 
 with the LIMIT 10 my EXPLAIN is:
 
 ++-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key   
   | key_len | ref| rows | Extra   |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | 
 ndx_open_close_rcvd | 8   | NULL   | 32393316 | Using where; 
 Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY   
   | 2   | syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY   
   | 2   | syslog.sc.dst_port |1 | |
 ++-+---++-+-+-++--+-+
 
 When I remove the LIMIT 10 I get:
 
 +-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key | key_len 
 | ref| rows | Extra   |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL
 | NULL   | 32393330 | Using where; Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   
 | syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   
 | syslog.sc.dst_port |1 | |
 ++-+---++-+-+-++--+-+
 
 Thanks for all your help thus far.
 
 On 09/08/2011 02:38 PM, Andrew Moore wrote:
 I don't think I saw any query timings in the emails (maybe I missed them).
 
 What version of MySQL are you currently using?
 What does the explain look like when your remove the limit 10?
 Is your server tuned for MyISAM or InnoDB?
 What kind of disk setup is in use?
 How much memory is in your machine?
 
 
 On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com  wrote:
 
 Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.
 
 
 
 On 09/08/2011 02:16 PM, Andrew Moore wrote:
 
 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?
 
 Andy
 
 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:
 
  On 09/01/2011 01:32 PM, Brandon Phelps wrote:
 
  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
 
  On 9/1/2011 09:42, Brandon Phelps wrote:
 
  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
 
 ...
 
 
  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30
 
 00:00:00')
 
 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30
 
 12:36:53')
 
  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'
 
 
  Now add an index over open_dt and close_dt and see what happens.
 
 
  Jochem
 
 
 Jochem,
 
 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull
 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Mihail,

I have considered this but have not yet determined how best to go about 
partitioning the table.  I don't think partitioning by dst_address or 
src_address would help because most of the queries do not filter on IP address 
(except very specific queries where the end-user is searching the table for 
history on a particular employee).

I could potentially partition the table based on the day of week the connection 
was opened on which may improve performance for a while since this would take 
me from a single 32million record table down to roughly 4.5 million records per 
partition (32 / 7) however we are looking to eventually store up to 2 months 
worth of data in the table, and so far that 32 million records is only for 1 
month, so I estimate another 32 million-ish before the month is out, bringing 
me to roughly 70 million records total (it would be nice if I could store even 
more than 2 months, but given my currently performance dilemma I don't expect 
that to happen).  Also this does not take into account that the end-user will 
often be pulling data for multiple days at a time, meaning that multiple 
partitions in this scheme will need to be accessed anyway.

The only other logical partitioning scheme I can think of would be to partition 
based on dst_port (the port the log relates to) but the majority of records are 
all to port 80 (standard web traffic) so I don't think this would be all that 
helpful.

I have never messed with partitioning so it is totally possible that I am not 
thinking of something, so if you have any ideas on a decent partitioning scheme 
based on my criteria and queries below, please let me know.

Thanks,
Brandon
 


On 09/08/2011 02:47 PM, Mihail Manolov wrote:

If you're running version 5.1+ you may wanna take a look at table partitioning 
options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:


Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
date ranges and new data is constantly coming in, so I am not sure how I could 
archive/cache the necessary data that would be any more efficient than simply 
using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:


On 09/01/2011 01:32 PM, Brandon Phelps wrote:


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


On 9/1/2011 09:42, Brandon Phelps wrote:


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...



WHERE
(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

00:00:00')

AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

12:36:53')


In that case your logic here simplifies to:
WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



Now add an index over open_dt and close_dt and see what happens.



Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by || markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start= (ending time) and end= (starting time)

Try that and let us know the results.



Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 
13:18:58' = close_dt = '2011-09-08 13:18:58');


++-+---++---+-+-++--+-+
| id | select_type | table | type   | possible_keys | key | key_len | ref   
 | rows | Extra   |
++-+---++---+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| NULL  | NULL| NULL| NULL  
 | 32393330 | Using where |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++---+-+-++--+-+

I did create indexes on open_dt and close_dt (2 separate indexes).



On 09/08/2011 02:55 PM, Derek Downey wrote:

Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
time)=close_dt=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:


Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

++-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key 
| key_len | ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8   | NULL   | 32393316 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

When I remove the LIMIT 10 I get:

+-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key | key_len | 
ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL| 
NULL   | 32393330 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is 

Re: Query Optimization

2011-09-08 Thread Andrew Moore
Partitioning isn't a bad idea for this however I'm still thinking about your
dataset size and possible hardware limitations. It's not likely going to fit
into relevant buffers/memory so you're going to be on disk more then you
want. You're probably creating temporary tables like crazy and I would bet
that there are a good load of them heading to disk too. With your IO
performance limited to a small amount of disks as you describe, you're not
going to be able to get much more from these queries. Although a dedicated
DB server are there other databases been accessed on the server? When
looking at the scope of your data, are you capturing more then you need? How
often and how far back are the users querying? How many users concurrently
performing queries on the 32m record table?

On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelps bphe...@gls.com wrote:

 Mihail,

 I have considered this but have not yet determined how best to go about
 partitioning the table.  I don't think partitioning by dst_address or
 src_address would help because most of the queries do not filter on IP
 address (except very specific queries where the end-user is searching the
 table for history on a particular employee).

 I could potentially partition the table based on the day of week the
 connection was opened on which may improve performance for a while since
 this would take me from a single 32million record table down to roughly 4.5
 million records per partition (32 / 7) however we are looking to eventually
 store up to 2 months worth of data in the table, and so far that 32 million
 records is only for 1 month, so I estimate another 32 million-ish before the
 month is out, bringing me to roughly 70 million records total (it would be
 nice if I could store even more than 2 months, but given my currently
 performance dilemma I don't expect that to happen).  Also this does not take
 into account that the end-user will often be pulling data for multiple days
 at a time, meaning that multiple partitions in this scheme will need to be
 accessed anyway.

 The only other logical partitioning scheme I can think of would be to
 partition based on dst_port (the port the log relates to) but the majority
 of records are all to port 80 (standard web traffic) so I don't think this
 would be all that helpful.

 I have never messed with partitioning so it is totally possible that I am
 not thinking of something, so if you have any ideas on a decent partitioning
 scheme based on my criteria and queries below, please let me know.

 Thanks,
 Brandon


 On 09/08/2011 02:47 PM, Mihail Manolov wrote:

 If you're running version 5.1+ you may wanna take a look at table
 partitioning options you may have.

 On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

  Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.


 On 09/08/2011 02:16 PM, Andrew Moore wrote:

 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?

 Andy

 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com
 wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:

  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

  On 9/1/2011 09:42, Brandon Phelps wrote:

  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

 ...


  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

 00:00:00')

 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

 12:36:53')

  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'


  Now add an index over open_dt and close_dt and see what happens.


  Jochem


 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so
 given
 your logic if the connection was started 2 days ago and I want to
 pull 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start
 and
 end dates, I need to see that record.

 Any other ideas?


  I believe Jochem was on the right track but he got his dates
 reversed.

 Let's try a little ASCII art to show the situation. I will setup a
 query
 window with two markers (s) and (e). Events will be marked by ||
 markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time
 and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be

Re: Query Optimization

2011-09-08 Thread Mihail Manolov
How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:

 Thanks for the idea Derek, however given the following query my EXPLAIN 
 output is identical:
 
 SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR 
 ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58');
 
 
 ++-+---++---+-+-++--+-+
 | id | select_type | table | type   | possible_keys | key | key_len | ref 
| rows | Extra   |
 ++-+---++---+-+-++--+-+
 |  1 | SIMPLE  | sc| ALL| NULL  | NULL| NULL| 
 NULL   | 32393330 | Using where |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
 syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
 syslog.sc.dst_port |1 | |
 ++-+---++---+-+-++--+-+
 
 I did create indexes on open_dt and close_dt (2 separate indexes).
 
 
 
 On 09/08/2011 02:55 PM, Derek Downey wrote:
 Correct me if I'm wrong. You're wanting to get all records that have an 
 open_date or a close_date between two times.
 
 If that's correct, you might be able to get an index_merge by doing a query 
 like:
 
 WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
 time)=close_dt=(ending time))
 
 and creating two indexes (one on 'open_dt' and the other on 'close_dt')
 
 http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html
 
 Regards,
 Derek
 
 On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:
 
 Andy,
 
 The queries take minutes to run.  MySQL is 5.1.54 and it's running on 
 Ubuntu server 11.04.  Unfortunately the machine only has 2GB of RAM but no 
 other major daemons are running on the machine.  We are running RAID 1 
 (mirroring) with 1TB drives.  The tables in question here are all MyISAM.  
 When running with the LIMIT 10 my EXPLAIN is:
 
 ++-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key 
 | key_len | ref| rows | Extra   
 |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | 
 ndx_open_close_rcvd | 8   | NULL   | 32393316 | Using 
 where; Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
 | 2   | syslog.sc.src_port |1 | 
 |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
 | 2   | syslog.sc.dst_port |1 | 
 |
 ++-+---++-+-+-++--+-+
 
 When I remove the LIMIT 10 I get:
 
 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Andrew,

Generally there is only 1 user performing the complicated SELECT query at a time, however 
the background process that fills the table is constantly doing a fast SELECT (0.3 
seconds) and a subsequent UPDATE.  Basically whenever a connection is closed on the 
firewall, the bg process SELECTS from the table the last record that was opened (and 
hasn't already been closed) for a given source IP, source port, dest ip, and protocol.  
So for example whenever the firewall logs a CONNECTION CLOSED message, the bg 
process does:

select id from sonicwall_connections where src_address = w.x.y.z and src_port = 
x and dst_address = w.x.y.z and protocol = x ORDER BY open_dt LIMIT 0, 1

then it UPDATES the close_dt column for the record with the selected ID.  These 
select/update statements being run by the background process generally take 
0.000x seconds each.

The only slowdown we see is with the complicated SELECT I have below, and this 
is only executed once in a while throughout the day.  For example, if an IT 
manager sees a huge bandwidth spike on the network, he may access the web gui 
to determine what the source IP is so he can determine who is downloading a 
large file, etc.

I think what I will try to do is create 60 partitions for the table in question 
based on month and day.  This way each day will go into it's own partition and 
if someone runs the query for a date range such as 01/01/2011 - 01/07/2011 it 
will only need to access 7 partitions instead of the entire table.

My only question with this is how would I go about creating the table with 60 
partitions in such a way that I won't need to constantly add/remove new/old 
partitions every day?  Is there any partitioning function I can use that would 
not require me to update the partitions schema every day?  I already plan to 
have a cron run each night to purge records older than 60 days from the 
database.

On 09/08/2011 03:26 PM, Andrew Moore wrote:

Partitioning isn't a bad idea for this however I'm still thinking about your
dataset size and possible hardware limitations. It's not likely going to fit
into relevant buffers/memory so you're going to be on disk more then you
want. You're probably creating temporary tables like crazy and I would bet
that there are a good load of them heading to disk too. With your IO
performance limited to a small amount of disks as you describe, you're not
going to be able to get much more from these queries. Although a dedicated
DB server are there other databases been accessed on the server? When
looking at the scope of your data, are you capturing more then you need? How
often and how far back are the users querying? How many users concurrently
performing queries on the 32m record table?

On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelpsbphe...@gls.com  wrote:


Mihail,

I have considered this but have not yet determined how best to go about
partitioning the table.  I don't think partitioning by dst_address or
src_address would help because most of the queries do not filter on IP
address (except very specific queries where the end-user is searching the
table for history on a particular employee).

I could potentially partition the table based on the day of week the
connection was opened on which may improve performance for a while since
this would take me from a single 32million record table down to roughly 4.5
million records per partition (32 / 7) however we are looking to eventually
store up to 2 months worth of data in the table, and so far that 32 million
records is only for 1 month, so I estimate another 32 million-ish before the
month is out, bringing me to roughly 70 million records total (it would be
nice if I could store even more than 2 months, but given my currently
performance dilemma I don't expect that to happen).  Also this does not take
into account that the end-user will often be pulling data for multiple days
at a time, meaning that multiple partitions in this scheme will need to be
accessed anyway.

The only other logical partitioning scheme I can think of would be to
partition based on dst_port (the port the log relates to) but the majority
of records are all to port 80 (standard web traffic) so I don't think this
would be all that helpful.

I have never messed with partitioning so it is totally possible that I am
not thinking of something, so if you have any ideas on a decent partitioning
scheme based on my criteria and queries below, please let me know.

Thanks,
Brandon


On 09/08/2011 02:47 PM, Mihail Manolov wrote:


If you're running version 5.1+ you may wanna take a look at table
partitioning options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

  Thanks for the reply Andy.  Unfortunately the users will be selecting

varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

++--++++--+-++--++
| id | select_type  | table  | type   | possible_keys   
   | key  | key_len | ref| rows | Extra  |
++--++++--+-++--++
|  1 | PRIMARY  | sc | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL  
 | 1057 | Using where|
|  1 | PRIMARY  | spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  1 | PRIMARY  | dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
|  2 | UNION| sc | range  | ndx_close_dt
   | ndx_close_dt | 8   | NULL   | 1131 | Using where|
|  2 | UNION| spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  2 | UNION| dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
| NULL | UNION RESULT | union1,2 | ALL| NULL  
 | NULL | NULL| NULL   | NULL | Using filesort |
++--++++--+-++--++



On 09/08/2011 03:45 PM, Mihail Manolov wrote:

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings 

Re: Query Optimization

2011-09-08 Thread Mihail Manolov
From the manual: The default behavior for UNION is that duplicate rows are 
removed from the result.

On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:

 Mihail,
 
 Thanks so much!  I modified your example to include the proper ORDER BY and 
 LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
 Question, if a record's open_dt is between the range AND the close_dt is 
 between the range as well, will the UNION output the record twice?  If so, is 
 there any way to prevent that?
 
 (SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
 UNION
 (SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
 ORDER BY rcvd DESC LIMIT 10;
 
 ++--++++--+-++--++
 | id | select_type  | table  | type   | possible_keys 
  | key  | key_len | ref| rows | Extra 
  |
 ++--++++--+-++--++
 |  1 | PRIMARY  | sc | range  | 
 open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL
| 1057 | Using where|
 |  1 | PRIMARY  | spm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.src_port |1 |   
  |
 |  1 | PRIMARY  | dpm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.dst_port |1 |   
  |
 |  2 | UNION| sc | range  | ndx_close_dt  
  | ndx_close_dt | 8   | NULL   | 1131 | Using where   
  |
 |  2 | UNION| spm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.src_port |1 |   
  |
 |  2 | UNION| dpm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.dst_port |1 |   
  |
 | NULL | UNION RESULT | union1,2 | ALL| NULL
| NULL | NULL| NULL   | NULL | Using 
 filesort |
 ++--++++--+-++--++
 
 
 
 On 09/08/2011 03:45 PM, Mihail Manolov wrote:
 How about:
 
 SELECT
  sc.open_dt,
  sc.close_dt,
  sc.protocol,
  INET_NTOA(sc.src_address) AS src_address,
  sc.src_port,
  INET_NTOA(sc.dst_address) AS dst_address,
  sc.dst_port,
  sc.sent,
  sc.rcvd,
  spm.desc AS src_port_desc,
  dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
  LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
  LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
 
 UNION
 
 SELECT
  sc.open_dt,
  sc.close_dt,
  sc.protocol,
  INET_NTOA(sc.src_address) AS src_address,
  sc.src_port,
  INET_NTOA(sc.dst_address) AS dst_address,
  sc.dst_port,
  sc.sent,
  sc.rcvd,
  spm.desc AS src_port_desc,
  dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
  LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
  LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
 
 
 
 On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:
 
 Thanks for the idea Derek, however given the following query my EXPLAIN 
 output is identical:
 
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Ah I see.  Well thanks for your assistance!

-Brandon

On 09/08/2011 05:21 PM, Mihail Manolov wrote:

 From the manual: The default behavior for UNION is that duplicate rows are removed 
from the result.

On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:


Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

++--++++--+-++--++
| id | select_type  | table  | type   | possible_keys   
   | key  | key_len | ref| rows | Extra  |
++--++++--+-++--++
|  1 | PRIMARY  | sc | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL  
 | 1057 | Using where|
|  1 | PRIMARY  | spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  1 | PRIMARY  | dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
|  2 | UNION| sc | range  | ndx_close_dt
   | ndx_close_dt | 8   | NULL   | 1131 | Using where|
|  2 | UNION| spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  2 | UNION| dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
| NULL | UNION RESULT |union1,2  | ALL| NULL  
 | NULL | NULL| NULL   | NULL | Using filesort |
++--++++--+-++--++



On 09/08/2011 03:45 PM, Mihail Manolov wrote:

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS 

Re: Query Optimization

2011-09-01 Thread Jochem van Dieten
On Aug 30, 2011 6:46 PM, Brandon Phelps wrote:
 SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc

If this is a firewall connection log I presume open_dt is the time a
connection was opened and is always going to be less than close_dt. Right?

 WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

In that case your logic here simplifies to:
WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

Now add an index over open_dt and close_dt and see what happens.

Jochem


Re: Query Optimization

2011-09-01 Thread Brandon Phelps

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA( sc.src_address ) AS src_address,
 sc.src_port,
 INET_NTOA( sc.dst_address ) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
  FROM firewall_connections AS sc

 If this is a firewall connection log I presume open_dt is the time a
 connection was opened and is always going to be less than close_dt.
 Right?

  WHERE
 (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
 AND (open_dt = '2011-08-30 12:36:53' OR close_dt = 
'2011-08-30 12:36:53')


 In that case your logic here simplifies to:
 WHERE
   open_dt = '2011-08-30 00:00:00'
   AND
   close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve 
results where the open_dt time is out of the range specified.  For 
example, a very large file download might span multiple days so given 
your logic if the connection was started 2 days ago and I want to pull 1 
days worth of connections, I would miss that entry.  Basically I want to 
SELECT all of the records that were opened OR closed during the 
specified time period, ie. if any activity happened between my start and 
end dates, I need to see that record.


Any other ideas?

Thanks again,
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-09-01 Thread Shawn Green (MySQL)

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
   ...

   WHERE
   (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
   AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

  In that case your logic here simplifies to:
  WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

  Now add an index over open_dt and close_dt and see what happens.

  Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query 
window with two markers (s) and (e).  Events will be marked by || 
markers showing their durations.


a)   (s)   (e)
b) |---|
c)  |---|
d)|---|
e)  ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time 
and (e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be 
part of your results.
c) the event starts before the window but ends within the window - 
include this

d) the event starts and ends within the window  - include this
e) the event starts before the window and ends after the window - 
include this
f) the event starts inside the window but ends beyond the window - 
include this.

g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need 
for a WHERE clause


WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-09-01 Thread Brandon Phelps

On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  ...

  WHERE
  (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
  AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

 In that case your logic here simplifies to:
 WHERE
 open_dt = '2011-08-30 00:00:00'
 AND
 close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query window 
with two markers (s) and (e). Events will be marked by || markers showing 
their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and (e) 
is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part of 
your results.
c) the event starts before the window but ends within the window - include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include this
f) the event starts inside the window but ends beyond the window - include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need for a 
WHERE clause

WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the exact 
same EXPLAIN output:   (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is 
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---++---+--+-++--+-+
| id | select_type | table | type   | possible_keys | key  | 
key_len | ref| rows | Extra   |
++-+---++---+--+-++--+-+
|  1 | SIMPLE  | sc| index  | open_dt,ndx_open_close_dt | ndx_rcvd | 4  
 | NULL   |   10 | Using where |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY  | 2  
 | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY  | 2  
 | syslog.sc.dst_port |1 | |
++-+---++---+--+-++--+-+

EXPLAIN output for new method with new index:
++-+---++---+--+-++--+-+
| id | select_type | table | type   | possible_keys | key  | 
key_len | ref| rows | Extra   |

Re: Query Optimization

2011-08-30 Thread Michael Dykman
There are a few things gonig on, but mainly it is the ORs that are killing
you.

As your  require OR to examine two distinct columns, both of equal relevance
to the query, MySQL: is left with no choice but to do a full table scan on
what might be (at a guess) a very larger table.  No amount of indexing will
fix this for the query presented.

You would be better off writing it as two distinct queires, each concerned
with conditions on a single column (open_dt and close_dt) and then UNIONing
the results. In this form, the indexes have a chance of being engaged.

Once the indexes are engaged, you probably want to your DATE/DATETIME
strings into actual DATEs or DATATIMEs, thus:

 ...
   (open_dt = DATE('2011-08-30 00:00:00'))
 ...

In it's current state, the DATE fields are being converted to strings
implicitly for every row tested which further frustrates index usage as the
index is against the quasi-numeric DATE, not the string representation which
your current implementation appears to expect.  This query would also
suddenly begin to fail entirely if the DEFAULT_DATE_FORMAT gets modified by
an admin or a future release of MySQL.  The explicit casting I have
suggested will protect you against that,

 - michael dykman`
On Tue, Aug 30, 2011 at 12:45 PM, Brandon Phelps bphe...@gls.com wrote:

 Hello,

 I have the following query I'd like to optimize a bit:

 SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
 00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
 12:36:53')
 ORDER BY rcvd DESC
 LIMIT 0 , 10

 Currently I have an index on the rcvd column which seems to be working
 based on the output of EXPLAIN:

 id  select_type table   typepossible_keys   key key_len
 ref rowsExtra
 1   SIMPLE  sc  index   open_dt ndx_rcvd4
 NULL10  Using where
 1   SIMPLE  spm eq_ref  PRIMARY PRIMARY 2
 syslog.sc.src_port  1
 1   SIMPLE  dpm eq_ref  PRIMARY PRIMARY 2
 syslog.sc.dst_port  1

 However the query is still fairly slow for some reason, any ideas how I
 could speed it up with additional indexes, etc?

 The values I am using in the WHERE clause are variable and change each
 time.

 The table has around 23 million records right now but is expected to
 continue to grow up to a potential 150 million.

 Here is the table schema:
 CREATE TABLE IF NOT EXISTS `firewall_connections` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `open_dt` datetime NOT NULL,
  `close_dt` datetime NOT NULL,
  `protocol` smallint(6) NOT NULL,
  `src_address` int(10) unsigned NOT NULL,
  `src_port` smallint(5) unsigned NOT NULL,
  `dst_address` int(10) unsigned NOT NULL,
  `dst_port` smallint(5) unsigned NOT NULL,
  `sent` int(10) unsigned NOT NULL,
  `rcvd` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ndx_rcvd` (`rcvd`),
  KEY `ndx_sent` (`sent`)
 ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

 Thanks in advance!

 --
 Brandon

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/mysql?unsub=mdyk...@gmail.com




-- 
 - michael dykman
 - mdyk...@gmail.com

 May the Source be with you.


Re: Query Optimization

2011-08-10 Thread Brandon Phelps
Thanks Singer,  this took my query down to 0.0007, perfect!  I wasn't 
aware a single index of multiple columns would work when one of the 
columns was in the WHERE clause and the other in the ORDER BY clause. 
Learn something new every day I guess!



On 08/10/2011 02:03 PM, Singer X.J. Wang wrote:

Try a index on (dst_port,close_dt)

On Wed, Aug 10, 2011 at 14:01, Brandon Phelps bphe...@gls.com
mailto:bphe...@gls.com wrote:

Hello all,

I am using the query below and variations of it to query a database
with a TON of records.  Currently the database has around 11 million
records but it grows every day and should cap out at around 150 million.

I am curious if there is any way I can better optimize the below
query, as currently it takes this query around 10 seconds to run but
I am sure this will get slower and slower as the database grows.

SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the
output of EXPLAIN:

id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Thanks in advance,

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?__unsub=w...@singerwang.com
http://lists.mysql.com/mysql?unsub=w...@singerwang.com


--
The best compliment you could give Pythian for our service is a referral.



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-08-10 Thread Peter Brawley

On 8/10/2011 1:01 PM, Brandon Phelps wrote:

Hello all,

I am using the query below and variations of it to query a database 
with a TON of records.  Currently the database has around 11 million 
records but it grows every day and should cap out at around 150 million.


I am curious if there is any way I can better optimize the below 
query, as currently it takes this query around 10 seconds to run but I 
am sure this will get slower and slower as the database grows.


SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the 
output of EXPLAIN:


id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Did you try adding your ORDER BY argument close_dt to the index?

PB

-


Thanks in advance,



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



  1   2   3   4   5   6   7   8   9   10   >