Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%

2007-10-25 Thread John Kraal

Auch, thanks for pointing that out, what a terrible mistake.

I am aware of the performance issue, and so is the customer. But with a 
table that's only going to hold maximally 60.000 records in 10 years, 
I'm not afraid it'll cause significant problems. If it gets out of hand 
we'll have to think of a better solution.


Once again, thanks!

John

--
/ Humanique
/ Webstrategie en ontwikkeling
/ http://www.humanique.com/

-
Humanique zoekt een ervaren Web-ontwikkelaar (PHP).
Bekijk de vacature op http://www.humanique.com/
-


Jeremy Cole wrote:

Hi John,

OK, no conspiracy here.  Here is your problem:

25  $qry = sprintf(SELECT id, line FROM `encryptietest` 
WHERE AES_DECRYPT(`field`, '%') LIKE '%%%s%%', $enckey, $word);


You are missing the s in %s for your first string argument, which 
causes the query to be syntactically incorrect and fail.  So your AES 
test is only testing how quickly you can query with a syntax error. :)


After adding the s, the results I get are:

 
([EMAIL PROTECTED]) [~/datisstom/bench]$ php -q bench.php
Control test (plain/text LIKE %..%):1.383749s
Decrypt test (AES_DECRYPT() LIKE %..%): 1.441944s
done


Nonetheless, I'd still argue that this entire concept is doomed to 
terrible performance anyway.


Regards,

Jeremy

John Kraal wrote:

I put it here:

http://pro.datisstom.nl/tests/bench.tar.bz2

The encryption isn't really a *real* security measure, except for when 
somebody is stupid enough to install phpMyAdmin or anything equivalent 
and try to get personal data. The problem is the password needs to be 
anywhere on the application-server and if you're in, you're in. But 
it's a request and I'm happy to oblige. Even if it only stops them for 
1 minute (which could be enough).


Regards,

John

--
/ Humanique
/ Webstrategie en ontwikkeling
/ http://www.humanique.com/

-
Humanique zoekt een ervaren Web-ontwikkelaar (PHP).
Bekijk de vacature op http://www.humanique.com/
-


Jeremy Cole wrote:

Hi John,

Your attachment for the php code got stripped somewhere.  Can you 
post it somewhere (http preferable)?  In either case it's going to 
result in a full table scan, so they are actually both a bad strategy 
long term, but they should in theory perform as you would expect, 
with with encryption being slightly slower.


Have you tried with longer strings?

What is your customer's fear with having the data in plain text? 
Presumably in order to use this in your application, you will have 
the AES password stored in your application, and it will end up in 
logs (such as the slow query log) quite frequently.  I would think 
your data can be safer and your security more effective by setting 
some policies which are less intrusive into the actual workings of 
the data, such as encrypting backups and setting system-level policies.


Regards,

Jeremy

John Kraal wrote:

Dear you,

I've been working on encrypting some data for a customer. They want
their personal/sensitive information encrypted in the database, but 
they
want to be able to search it too, through the application. So we've 
been

thinking a bit, and just started trying and benchmarking some solutions
we thought up.

The next one really got my attention, I created a table with 4 fields:

1. id (primary/auto_increment, not really interesting)
2. field, with encrypted data
3. md5sum (it has no special use, we benched it though.)
4. line, always containing three words (the same three as encrypted)

When we started querying the table for random words (from lipsum.com),
it seems that searching in the encrypted fields was _lots_ faster.
Results below:

1.000 queries per field:

~$ php -q searchtest.php
Control test (plain/text LIKE %..%):1.409699s
Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s
done

1.000.000 queries per field:

~$ php -q searchtest.php
Control test (plain/text LIKE %..%):155.059671s
Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s
done

Actually, the only thing I could think of to say was: Well, at 
least it's consistent.


I've attached all the files I used for this test. Edit db.inc.php 
(add some more lipsum if you want), execute fill.php, and then have 
fun with bench.php.


Does any of you know why this is, how come, etc? I'm just very curious.

Regards,

John Kraal



 










--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%

2007-10-24 Thread John Kraal

I put it here:

http://pro.datisstom.nl/tests/bench.tar.bz2

The encryption isn't really a *real* security measure, except for when 
somebody is stupid enough to install phpMyAdmin or anything equivalent 
and try to get personal data. The problem is the password needs to be 
anywhere on the application-server and if you're in, you're in. But it's 
a request and I'm happy to oblige. Even if it only stops them for 1 
minute (which could be enough).


Regards,

John

--
/ Humanique
/ Webstrategie en ontwikkeling
/ http://www.humanique.com/

-
Humanique zoekt een ervaren Web-ontwikkelaar (PHP).
Bekijk de vacature op http://www.humanique.com/
-


Jeremy Cole wrote:

Hi John,

Your attachment for the php code got stripped somewhere.  Can you post 
it somewhere (http preferable)?  In either case it's going to result in 
a full table scan, so they are actually both a bad strategy long term, 
but they should in theory perform as you would expect, with with 
encryption being slightly slower.


Have you tried with longer strings?

What is your customer's fear with having the data in plain text? 
Presumably in order to use this in your application, you will have the 
AES password stored in your application, and it will end up in logs 
(such as the slow query log) quite frequently.  I would think your data 
can be safer and your security more effective by setting some policies 
which are less intrusive into the actual workings of the data, such as 
encrypting backups and setting system-level policies.


Regards,

Jeremy

John Kraal wrote:

Dear you,

I've been working on encrypting some data for a customer. They want
their personal/sensitive information encrypted in the database, but they
want to be able to search it too, through the application. So we've been
thinking a bit, and just started trying and benchmarking some solutions
we thought up.

The next one really got my attention, I created a table with 4 fields:

1. id (primary/auto_increment, not really interesting)
2. field, with encrypted data
3. md5sum (it has no special use, we benched it though.)
4. line, always containing three words (the same three as encrypted)

When we started querying the table for random words (from lipsum.com),
it seems that searching in the encrypted fields was _lots_ faster.
Results below:

1.000 queries per field:

~$ php -q searchtest.php
Control test (plain/text LIKE %..%):1.409699s
Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s
done

1.000.000 queries per field:

~$ php -q searchtest.php
Control test (plain/text LIKE %..%):155.059671s
Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s
done

Actually, the only thing I could think of to say was: Well, at least 
it's consistent.


I've attached all the files I used for this test. Edit db.inc.php (add 
some more lipsum if you want), execute fill.php, and then have fun 
with bench.php.


Does any of you know why this is, how come, etc? I'm just very curious.

Regards,

John Kraal










--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%

2007-10-24 Thread Jeremy Cole

Hi John,

OK, no conspiracy here.  Here is your problem:

25  $qry = sprintf(SELECT id, line FROM `encryptietest` 
WHERE AES_DECRYPT(`field`, '%') LIKE '%%%s%%', $enckey, $word);


You are missing the s in %s for your first string argument, which 
causes the query to be syntactically incorrect and fail.  So your AES 
test is only testing how quickly you can query with a syntax error. :)


After adding the s, the results I get are:


([EMAIL PROTECTED]) [~/datisstom/bench]$ php -q bench.php
Control test (plain/text LIKE %..%):1.383749s
Decrypt test (AES_DECRYPT() LIKE %..%): 1.441944s
done


Nonetheless, I'd still argue that this entire concept is doomed to 
terrible performance anyway.


Regards,

Jeremy

John Kraal wrote:

I put it here:

http://pro.datisstom.nl/tests/bench.tar.bz2

The encryption isn't really a *real* security measure, except for when 
somebody is stupid enough to install phpMyAdmin or anything equivalent 
and try to get personal data. The problem is the password needs to be 
anywhere on the application-server and if you're in, you're in. But it's 
a request and I'm happy to oblige. Even if it only stops them for 1 
minute (which could be enough).


Regards,

John

--
/ Humanique
/ Webstrategie en ontwikkeling
/ http://www.humanique.com/

-
Humanique zoekt een ervaren Web-ontwikkelaar (PHP).
Bekijk de vacature op http://www.humanique.com/
-


Jeremy Cole wrote:

Hi John,

Your attachment for the php code got stripped somewhere.  Can you post 
it somewhere (http preferable)?  In either case it's going to result in 
a full table scan, so they are actually both a bad strategy long term, 
but they should in theory perform as you would expect, with with 
encryption being slightly slower.


Have you tried with longer strings?

What is your customer's fear with having the data in plain text? 
Presumably in order to use this in your application, you will have the 
AES password stored in your application, and it will end up in logs 
(such as the slow query log) quite frequently.  I would think your data 
can be safer and your security more effective by setting some policies 
which are less intrusive into the actual workings of the data, such as 
encrypting backups and setting system-level policies.


Regards,

Jeremy

John Kraal wrote:

Dear you,

I've been working on encrypting some data for a customer. They want
their personal/sensitive information encrypted in the database, but they
want to be able to search it too, through the application. So we've been
thinking a bit, and just started trying and benchmarking some solutions
we thought up.

The next one really got my attention, I created a table with 4 fields:

1. id (primary/auto_increment, not really interesting)
2. field, with encrypted data
3. md5sum (it has no special use, we benched it though.)
4. line, always containing three words (the same three as encrypted)

When we started querying the table for random words (from lipsum.com),
it seems that searching in the encrypted fields was _lots_ faster.
Results below:

1.000 queries per field:

~$ php -q searchtest.php
Control test (plain/text LIKE %..%):1.409699s
Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s
done

1.000.000 queries per field:

~$ php -q searchtest.php
Control test (plain/text LIKE %..%):155.059671s
Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s
done

Actually, the only thing I could think of to say was: Well, at least 
it's consistent.


I've attached all the files I used for this test. Edit db.inc.php (add 
some more lipsum if you want), execute fill.php, and then have fun 
with bench.php.


Does any of you know why this is, how come, etc? I'm just very curious.

Regards,

John Kraal










--
high performance mysql consulting
www.provenscaling.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Query performance plain/text versus AES_DECRYPT(): LIKE %..%

2007-10-23 Thread Jerry Schwartz
Have you tried reversing the order of your tests, to see if there is some
influence from caching?

Regards,

Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com
www.giiexpress.com
www.etudes-marche.com

 -Original Message-
 From: John Kraal [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, October 23, 2007 8:51 AM
 To: mysql@lists.mysql.com
 Subject: Query performance plain/text versus AES_DECRYPT(): LIKE %..%

 Dear you,

 I've been working on encrypting some data for a customer. They want
 their personal/sensitive information encrypted in the database, but
 they
 want to be able to search it too, through the application. So we've
 been
 thinking a bit, and just started trying and benchmarking some solutions
 we thought up.

 The next one really got my attention, I created a table with 4 fields:

 1. id (primary/auto_increment, not really interesting)
 2. field, with encrypted data
 3. md5sum (it has no special use, we benched it though.)
 4. line, always containing three words (the same three as encrypted)

 When we started querying the table for random words (from lipsum.com),
 it seems that searching in the encrypted fields was _lots_ faster.
 Results below:

 1.000 queries per field:

 ~$ php -q searchtest.php
 Control test (plain/text LIKE %..%):1.409699s
 Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s
 done

 1.000.000 queries per field:

 ~$ php -q searchtest.php
 Control test (plain/text LIKE %..%):155.059671s
 Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s
 done

 Actually, the only thing I could think of to say was: Well, at least
 it's consistent.

 I've attached all the files I used for this test. Edit db.inc.php (add
 some more lipsum if you want), execute fill.php, and then have fun with
 bench.php.

 Does any of you know why this is, how come, etc? I'm just very curious.

 Regards,

 John Kraal

 --
 / Humanique
 / Webstrategie en ontwikkeling
 / http://www.humanique.com/

 -
 Humanique zoekt een ervaren Web-ontwikkelaar (PHP).
 Bekijk de vacature op http://www.humanique.com/
 -






-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%

2007-10-23 Thread John Kraal

Yes, I did, and shutdowns between the tests and between reversing the tests.

--
/ Humanique
/ Webstrategie en ontwikkeling
/ http://www.humanique.com/

-
Humanique zoekt een ervaren Web-ontwikkelaar (PHP).
Bekijk de vacature op http://www.humanique.com/
-


Jerry Schwartz wrote:

Have you tried reversing the order of your tests, to see if there is some
influence from caching?

Regards,

Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com
www.giiexpress.com
www.etudes-marche.com


-Original Message-
From: John Kraal [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 23, 2007 8:51 AM
To: mysql@lists.mysql.com
Subject: Query performance plain/text versus AES_DECRYPT(): LIKE %..%

Dear you,

I've been working on encrypting some data for a customer. They want
their personal/sensitive information encrypted in the database, but
they
want to be able to search it too, through the application. So we've
been
thinking a bit, and just started trying and benchmarking some solutions
we thought up.

The next one really got my attention, I created a table with 4 fields:

1. id (primary/auto_increment, not really interesting)
2. field, with encrypted data
3. md5sum (it has no special use, we benched it though.)
4. line, always containing three words (the same three as encrypted)

When we started querying the table for random words (from lipsum.com),
it seems that searching in the encrypted fields was _lots_ faster.
Results below:

1.000 queries per field:

~$ php -q searchtest.php
Control test (plain/text LIKE %..%):1.409699s
Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s
done

1.000.000 queries per field:

~$ php -q searchtest.php
Control test (plain/text LIKE %..%):155.059671s
Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s
done

Actually, the only thing I could think of to say was: Well, at least
it's consistent.

I've attached all the files I used for this test. Edit db.inc.php (add
some more lipsum if you want), execute fill.php, and then have fun with
bench.php.

Does any of you know why this is, how come, etc? I'm just very curious.

Regards,

John Kraal

--
/ Humanique
/ Webstrategie en ontwikkeling
/ http://www.humanique.com/

-
Humanique zoekt een ervaren Web-ontwikkelaar (PHP).
Bekijk de vacature op http://www.humanique.com/
-








--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%

2007-10-23 Thread Jeremy Cole

Hi John,

Your attachment for the php code got stripped somewhere.  Can you post 
it somewhere (http preferable)?  In either case it's going to result in 
a full table scan, so they are actually both a bad strategy long term, 
but they should in theory perform as you would expect, with with 
encryption being slightly slower.


Have you tried with longer strings?

What is your customer's fear with having the data in plain text? 
Presumably in order to use this in your application, you will have the 
AES password stored in your application, and it will end up in logs 
(such as the slow query log) quite frequently.  I would think your data 
can be safer and your security more effective by setting some policies 
which are less intrusive into the actual workings of the data, such as 
encrypting backups and setting system-level policies.


Regards,

Jeremy

John Kraal wrote:

Dear you,

I've been working on encrypting some data for a customer. They want
their personal/sensitive information encrypted in the database, but they
want to be able to search it too, through the application. So we've been
thinking a bit, and just started trying and benchmarking some solutions
we thought up.

The next one really got my attention, I created a table with 4 fields:

1. id (primary/auto_increment, not really interesting)
2. field, with encrypted data
3. md5sum (it has no special use, we benched it though.)
4. line, always containing three words (the same three as encrypted)

When we started querying the table for random words (from lipsum.com),
it seems that searching in the encrypted fields was _lots_ faster.
Results below:

1.000 queries per field:

~$ php -q searchtest.php
Control test (plain/text LIKE %..%):1.409699s
Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s
done

1.000.000 queries per field:

~$ php -q searchtest.php
Control test (plain/text LIKE %..%):155.059671s
Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s
done

Actually, the only thing I could think of to say was: Well, at least 
it's consistent.


I've attached all the files I used for this test. Edit db.inc.php (add 
some more lipsum if you want), execute fill.php, and then have fun with 
bench.php.


Does any of you know why this is, how come, etc? I'm just very curious.

Regards,

John Kraal








--
high performance mysql consulting
www.provenscaling.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query performance.

2006-06-07 Thread Eugene Kosov

Thanks a lot!! :D

You were right. There was a bug. Upgrading to mysql 4.1.20 solved my 
problem.



Daniel da Veiga wrote:

Check http://bugs.mysql.com/bug.php?id=12915



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query performance.

2006-06-06 Thread Daniel da Veiga

On 6/6/06, Eugene Kosov [EMAIL PROTECTED] wrote:

Hi, List!

I'm a little bit confused with (IMHO) poor  query performance.

I have a table with 1'000'000 records.
Table consists of 2 service fields and a number of data fields. Service
fields are status and processor_id (added for concurrent queue processing).

The question is why are updates so slow?

A query like:

  UPDATE queue SET status=1 WHERE status=0 LIMIT 1;

takes about 5 seconds while this

  SELECT * FROM queue WHERE status=0 LIMIT 1;

takes 0.01-0.02 second.

As I can see in process list most of the time query is Searching rows
for update what's very strange. I thought  UPDATE searches rows the
same way SELECT does. Doesn't it?
Actually, seems like it does, because if I remove all fields except for
id and status, same both queries (SELECT  UPDATE) work quite fast.

So, why is my update query so slow? What can I do to make it work faster?
Can I somehow find out what is the bottleneck here? May be I should
increase some buffers or something else? I copied my-huge my.cnf
sample from mysql distribution.

I'm looking forward for any help because I'm stuck with this and don't
know what to do.
Thanks in advance to all!


P.S.:

Some table info:

mysql show table status like 'queue';
+--++-++++-+-+--+---++-+-++--+--++---+
| Name | Engine | Version | Row_format | Rows   |
Avg_row_length | Data_length | Max_data_length | Index_length |
Data_free | Auto_increment | Create_time | Update_time |
Check_time | Collation| Checksum | Create_options |
Comment   |
+--++-++++-+-+--+---++-+-++--+--++---+
| queue | InnoDB |   9 | Dynamic| 726423
|159 |   116031488 |NULL | 32555008
| 0 |101 | 2006-06-06 22:01:21 | NULL|
NULL   | koi8r_general_ci | NULL || InnoDB free:
68608 kB |
+--++-++++-+-+--+---++-+-++--+--++---+

mysql show indexes from queue;
+--++--+--+--+---+-+--++--++-+
| Table| Non_unique | Key_name | Seq_in_index |
Column_name  | Collation | Cardinality | Sub_part | Packed | Null |
Index_type | Comment |
+--++--+--+--+---+-+--++--++-+
| queue  |  0 | PRIMARY  |1
| id   | A | 1170633 | NULL | NULL   |  |
BTREE  | |
| queue  |  1 | status   |1
| status   | A |  18 | NULL | NULL   |  |
BTREE  | |
| queue  |  1 | processor_id |1
| processor_id | A |  18 | NULL | NULL   | YES  |
BTREE  | |
+--++--+--+--+---+-+--++--++-+
3 rows in set (0.01 sec)


/etc/my.cnf:

...
[mysqld]
port= 3306
socket  = /tmp/mysql.sock
skip-locking
key_buffer = 384M
max_allowed_packet = 1M
table_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 32M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 4
...



Check http://bugs.mysql.com/bug.php?id=12915

--
Daniel da Veiga
Computer Operator - RS - Brazil
-BEGIN GEEK CODE BLOCK-
Version: 3.1
GCM/IT/P/O d-? s:- a? C++$ UBLA++ P+ L++ E--- W+++$ N o+ K- w O M- V-
PS PE Y PGP- t+ 5 X+++ R+* tv b+ DI+++ D+ G+ e h+ r+ y++
--END GEEK CODE BLOCK--

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query performance...two table design options

2005-05-27 Thread Roger Baklund

James Tu wrote:

Hi:

Let's say I want to store the following information.

Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)


In general 'age' is a bad column, because you need to know what year the 
data was entered to calculate the current age. It is often better to 
store year of birth or date of birth. This may not be relevant to your 
application, I just wanted to mention it.



Date - DATETIME
Activity - VARCHAR(100)
Data - TEXT


I would be basing my queries on all columns _except_ the Data column. I.e. I 
would be using WHERE's with all except the Data column.


You are not telling us how much data you are planning to maintain. How 
big will the Data column be, on average, and how many rows/persons are 
 we talking about? Hundreds, thousands or millions?



My question is...which design would perform better?

(Design A) Put all in one table...index all the columns that I will use 
WHERE with.

-TABLE_ALL-
Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)
Date - DATETIME
Activity - VARCHAR(100)
Data - TEXT

Indices - Unique ID, First Name, Last Name, Age, Date, Activity


You will probably not need to index all columns. If you have few rows, 
you don't need indexes at all, except for the primary key on the unique 
ID. A primary key automatically works as an index.


I would probably start with only the primary key, and add indexes only 
when I find that some queries are too slow.



SELECT First_Name, Last_Name, Data
FROM TABLE_ALL
WHERE
Activity = 'draw' AND Age  24;



(Design B) Put the Data in its own separate table.
-TABLE_A-
Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)
Date - DATETIME
Activity - VARCHAR(100)
Data_ID - INT(10)

Indices - Unique ID, First Name, Last Name, Age, Date, Activity

-TABLE_B-
Data_ID - INT(10)
Data - TEXT

Index - Data_ID


This will be faster if your Data column is relatively big (several K on 
average, I don't know. depends on your HW, of course).


I would suggest using the unique ID from TABLE_A as a primary key in 
TABLE_B, and drop Data_ID from TABLE_A.


If there are millions of rows I would normalize these tables to the 
extreme, something like this:


Person: P_Id,Born
FName: FN_Id,FirstName
LName: LN_Id,LastName
FN_P: FN_Id,P_id
LN_P: LN_Id,P_id
Activity: A_Id,Activity
Act_P: A_id,P_Id
Data:P_Id,Data

FN_P and LN_P are so-called link tables, linking names to persons in a 
many-to-many relation. Even further normalization would have been 
achieved with an additional counter column. It would be used in these 
tables to maintain the order of the names when a person have multiple 
first names or last names, so that you would have one FName row for each 
unique name, Mary Jane would be split in Mary and Jane.


You could query this schema like this:

SELECT FirstName,LastName,Data
  FROM Person,FName,LName,Data,Activity,FN_P,LN_P,Act_P
  WHERE
Person.P_Id = Data.P_Id AND
Person.P_Id = FN_P.P_Id AND
Person.P_Id = LN_P.P_Id AND
Person.P_Id = Act_P.P_Id AND
FName.FN_Id = FN_P.FN_Id AND
LName.LN_Id = LN_P.LN_Id AND
Activity.A_Id = Act_P.A_Id AND
Activity = 'draw' and Born  year(now()) - 24

...or with more explicit formulated joins, like this:

SELECT FirstName,LastName,Data
  FROM Person
  NATURAL JOIN Act_P NATURAL JOIN Activity
  INNER JOIN FN_P ON FN_P.P_Id=Person.P_Id NATURAL JOIN FName
  INNER JOIN LN_P ON LN_P.P_Id=Person.P_Id NATURAL JOIN LName,
  LEFT JOIN Data ON Data.P_Id = Person.P_Id
  WHERE
Activity = 'draw' and Born  year(now()) - 24

The NATURAL JOINS are joins based on columns with the same name in the 
two joined tables, see the manual. The LEFT JOIN is used in this case 
because some Persons may not have a corresponding row in the Data table, 
in this case the Data column of the result table will contain NULL. If 
you used an INNER join in place of the LEFT join in this case, Persons 
without a Data record would be omitted from the result.



SELECT TABLE_A.First_Name, TABLE_A.Last_Name, TABLE_B.Data
FROM TABLE_A, TABLE_B
WHERE
Activity = 'draw' AND Age  24 AND TABLE_A.Data_ID = TABLE_B.Data_ID;
(Aside: Would this query give me the same results as the above query?)


Yes, I think so, if all rows in TABLE_A have a corresponding row in TABLE_B.

--
Roger


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Query Performance

2005-04-14 Thread Mike Johnson
From: Fernando Henrique Giorgetti [mailto:[EMAIL PROTECTED] 

 Hi Folks!
 
 Here, I have the following table:
 
 CREATE TABLE `accesses` (
   `time` varchar(15) NOT NULL default '',
   `duration` int(10) default NULL,
   `user` varchar(25) NOT NULL default '',
   `ipaddr` varchar(15) NOT NULL default '',
   `result` varchar(30) default NULL,
   `bytes` int(10) default NULL,
   `reqmethod` varchar(10) default NULL,
   `urlparent` varchar(100) NOT NULL default '',
   KEY `usuario` (`usuario`),
   KEY `time_result` (`time`, `result`)
 );
 
 If my table has a great number of rows (something like 5 
 millions), the result time is too much longer.
 
 select user, count(distinct 
 concat(date_format(from_unixtime(time), %d/%m/%Y),  - , 
 time_format(from_unixtime(time), %H:%i)), ipaddr, 
 urlparent) as qtd, sec_to_time(sum(duration)/1000) as 
 duration, sum(bytes) as bytes from acessos where time = 
 1109646000 and time = 1112324399 and result  
 TCP_DENIED/403 group by user order by user;
 
 PS: explaining this select, the time_result key is a 
 possible_key, but, in the key field I have the NULL value 
 (the NULL persists even if I force with use index()).
 
 Can anybody help me what can I do to make this query faster 
 (indexes, tuning, or, change the table structure or the query).
 
 Thank you !

The first thing I'd do is index the `result` field, as you're checking
against it in the WHERE clause. However, if I remember indexing behavior
correctly, that won't help if you only have a a few unique values in
that column. Give it a shot, though, I imagine it'd definitely help.

If that doesn't drastically improve it, I'd also look into a way around
performing the date and time functions in the query. I don't know if
that's possible, but depending on what this is feeding to (most likely
PHP or Perl), it may be quicker to do those calculations in the wrapping
script (if there is one, that is).

HTH!

-- 
Mike Johnson Smarter Living, Inc.
Web Developerwww.smartertravel.com
[EMAIL PROTECTED]   (617) 886-5539

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Query Performance

2005-04-14 Thread Mike Johnson
From: Fernando Henrique Giorgetti [mailto:[EMAIL PROTECTED] 

 Hi Folks!
 
 Here, I have the following table:
 
 CREATE TABLE `accesses` (
   `time` varchar(15) NOT NULL default '',
   `duration` int(10) default NULL,
   `user` varchar(25) NOT NULL default '',
   `ipaddr` varchar(15) NOT NULL default '',
   `result` varchar(30) default NULL,
   `bytes` int(10) default NULL,
   `reqmethod` varchar(10) default NULL,
   `urlparent` varchar(100) NOT NULL default '',
   KEY `usuario` (`usuario`),
   KEY `time_result` (`time`, `result`)
 );
 
 If my table has a great number of rows (something like 5 
 millions), the result time is too much longer.
 
 select user, count(distinct 
 concat(date_format(from_unixtime(time), %d/%m/%Y),  - , 
 time_format(from_unixtime(time), %H:%i)), ipaddr, 
 urlparent) as qtd, sec_to_time(sum(duration)/1000) as 
 duration, sum(bytes) as bytes from acessos where time = 
 1109646000 and time = 1112324399 and result  
 TCP_DENIED/403 group by user order by user;
 
 PS: explaining this select, the time_result key is a 
 possible_key, but, in the key field I have the NULL value 
 (the NULL persists even if I force with use index()).
 
 Can anybody help me what can I do to make this query faster 
 (indexes, tuning, or, change the table structure or the query).
 
 Thank you !

Oh, I'm sorry. I read your CREATE statement too quickly the first time
and didn't notice that the `time_result` index was across both `time`
and `result`. In that case, indexing `result` separately may not help at
all. Might be worth a shot, though, if you have the disk space and time
to play around with it.

-- 
Mike Johnson Smarter Living, Inc.
Web Developerwww.smartertravel.com
[EMAIL PROTECTED]   (617) 886-5539

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query Performance

2005-04-14 Thread SGreen
Fernando Henrique Giorgetti [EMAIL PROTECTED] wrote on 04/14/2005 
02:34:30 PM:

 Hi Folks!
 
 Here, I have the following table:
 
 CREATE TABLE `accesses` (
   `time` varchar(15) NOT NULL default '',
   `duration` int(10) default NULL,
   `user` varchar(25) NOT NULL default '',
   `ipaddr` varchar(15) NOT NULL default '',
   `result` varchar(30) default NULL,
   `bytes` int(10) default NULL,
   `reqmethod` varchar(10) default NULL,
   `urlparent` varchar(100) NOT NULL default '',
   KEY `usuario` (`usuario`),
   KEY `time_result` (`time`, `result`)
 );
 
 If my table has a great number of rows (something like 5 millions), 
 the result time is too much longer.
 
 select user, count(distinct concat(date_format(from_unixtime(time), 
 %d/%m/%Y),  - , time_format(from_unixtime(time), %H:%i)), 
 ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as 
 duration, sum(bytes) as bytes from acessos where time = 1109646000 
 and time = 1112324399 and result  TCP_DENIED/403 group by user 
 order by user;
 
 PS: explaining this select, the time_result key is a possible_key, 
 but, in the key field I have the NULL value (the NULL persists even 
 if I force with use index()).
 
 Can anybody help me what can I do to make this query faster 
 (indexes, tuning, or, change the table structure or the query).
 
 Thank you !
 -- 
 Fernando Henrique Giorgetti
 [EMAIL PROTECTED]
 Departamento de Tecnologia
 http://www.gruponet.com.br
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 

If I unfold and indent your query I get:

select user
, count(distinct 
concat(
date_format(from_unixtime(time), %d/%m/%Y)
,  - 
, time_format(from_unixtime(time), %H:%i)
)
, ipaddr
, urlparent
) as qtd
, sec_to_time(sum(duration)/1000) as duration
, sum(bytes) as bytes 
from acessos 
where time = 1109646000 
and time = 1112324399 
and result  TCP_DENIED/403 
group by user 
order by user;

Your COUNT() operator seems to be trying to execute a COUNT((concatenated 
date to nearest minute), ipaddr, urlparent). I may have unfolded it 
incorrectly but that's how it seems to me. I think you meant to put the 
ipaddr and urlparent fields INTO the CONCAT() but I am just working from 
what I got.

There is a faster way to compute time to the nearest minute than what you 
are doing with the string conversions. Just do an integer division of your 
TIME value by 60 and throw away the remainder like this:

time DIV 60

or like this:
FLOOR(time/60)

(http://dev.mysql.com/doc/mysql/en/arithmetic-functions.html)

If I understand your COUNT(DISTINCT ) statement correctly, you want to 
know how in how many different minutes the user used either a unique 
ipaddr or a different urlparent. Am I close? You can also compute  as 
the OR of two ranges (which may end up using the index or it may not...) 
So this could be a valid revision of your original query:

select user
, count(distinct 
concat(
FLOOR(time/60)
, ipaddr
, urlparent
)
) as qtd
, sec_to_time(sum(duration)/1000) as duration
, sum(bytes) as bytes 
from acessos 
where time = 1109646000 
and time = 1112324399 
and (
result  'TCP_DENIED/403' 
OR result  'TCP_DENIED/403'
)
group by user;

Note: GROUP BY includes a free ORDER BY unless you specify otherwise.

HTH,

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine





Re: Query Performance

2005-04-14 Thread Andrew Braithwaite
You could probably save a bit of processing time by changing:

concat(date_format(from_unixtime(time), %d/%m/%Y), - ,
time_format(from_unixtime(time), %H:%i))

to:

date_format(from_unixtime(time), %d/%m/%Y - %H:%i)

This would mean half the date conversions would be executed.

Separating out the 'time' and 'result' indicies will probably help too.

Cheers,

Andrew


On 14/4/05 6:34 pm, Fernando Henrique Giorgetti [EMAIL PROTECTED]
wrote:

 Hi Folks!
 
 Here, I have the following table:
 
 CREATE TABLE `accesses` (
   `time` varchar(15) NOT NULL default '',
   `duration` int(10) default NULL,
   `user` varchar(25) NOT NULL default '',
   `ipaddr` varchar(15) NOT NULL default '',
   `result` varchar(30) default NULL,
   `bytes` int(10) default NULL,
   `reqmethod` varchar(10) default NULL,
   `urlparent` varchar(100) NOT NULL default '',
   KEY `usuario` (`usuario`),
   KEY `time_result` (`time`, `result`)
 );
 
 If my table has a great number of rows (something like 5 millions), the result
 time is too much longer.
 
 select user, count(distinct concat(date_format(from_unixtime(time),
 %d/%m/%Y),  - , time_format(from_unixtime(time), %H:%i)), ipaddr,
 urlparent) as qtd, sec_to_time(sum(duration)/1000) as duration, sum(bytes) as
 bytes from acessos where time = 1109646000 and time = 1112324399 and result
  TCP_DENIED/403 group by user order by user;
 
 PS: explaining this select, the time_result key is a possible_key, but, in the
 key field I have the NULL value (the NULL persists even if I force with use
 index()).
 
 Can anybody help me what can I do to make this query faster (indexes, tuning,
 or, change the table structure or the query).
 
 Thank you !



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: query performance

2005-02-17 Thread Coz Web
If you do I suggest you also include relevant table definitions and
possibly a little sample data (plus an indication of total table
sizes) and expected output, this will greatly assist anyone who my be
able to help. Oh yes, and don't forget to state the version of MySQL
you are running.

Coz


On Wed, 16 Feb 2005 18:22:11 -0700, Ryan McCullough
[EMAIL PROTECTED] wrote:
 Can I post a query to this list and ask for help optimizing it?
 
 --
 Ryan McCullough
 mailto:[EMAIL PROTECTED]
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 
 


-- 
CozWeb Solutions Ltd
http://www.cozweb.net

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: query performance

2004-01-23 Thread mos
At 07:10 PM 1/23/2004, Larry Brown wrote:
I have a db that had some 20,000 records or so in it.  I have a query to
find out how many jobs have been input during the current day.  To add them
I ran the following query...
select count(idnumber) from maintable where inputdatetime  '$date
00:00:00' and client='smith'
$date is the current date in CCYY-MM-DD fashion and the query runs.  However
it seems fairly slow.  I have now added some 100,000+ records from a merge I
performed and now it takes a really long time.  Is there a better way to
query this that will take a load off the machine?  The only key in the table
is the idnumber.  I don't really know anything about how keys help or when
to/not to use them other than their being a necessity for an auto_increment
field.
TIA

Larry
Larry,
Add two indexes, one for InputDateTime and another for Client.
You should read up on MySQL. Try Paul Dubois  book MySQL 2nd 
Edition because starts off really easy with stuff like this and by the 
time you're done, you're an expert.

Mike 



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: Query performance

2003-09-19 Thread Jennifer Goodie
 2 index on this table:
  - one unique index on user_id and att_id (pk)
  - one index on att_id and user_id.

 I need to have the following query:

 select value from user_att where att_id = ? and value like '?'
 (no wildcard)
 1. when I do a explain, this query use the second index. But, if
 I change my
 second index to att_id and value, will the performance improve?

You could add it as a third index and see which works better, but on 18
million rows that's going to probably take quite a bit of time.  From
looking at your query it seems like it would be a better index than the
current one.

 2. what is the difference if I change the query to
select value from user_att where att_id = ? and lower(value) =
 lower('?')
will this query slower?
I could be wrong, but I believe the query won't use the index if you use
lower().  Run an explain on this query and see.

 3. when compare string, is mysql sql case sensitive? It seems that it is
 case in-sensitive. If case in-sensitive, the following query will
 be faster?
select value from user_att where att_id = ? and value = '?'

Mysql is only case sensitive on binary and blob fields.





-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]