Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Auch, thanks for pointing that out, what a terrible mistake. I am aware of the performance issue, and so is the customer. But with a table that's only going to hold maximally 60.000 records in 10 years, I'm not afraid it'll cause significant problems. If it gets out of hand we'll have to think of a better solution. Once again, thanks! John -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - Jeremy Cole wrote: Hi John, OK, no conspiracy here. Here is your problem: 25 $qry = sprintf(SELECT id, line FROM `encryptietest` WHERE AES_DECRYPT(`field`, '%') LIKE '%%%s%%', $enckey, $word); You are missing the s in %s for your first string argument, which causes the query to be syntactically incorrect and fail. So your AES test is only testing how quickly you can query with a syntax error. :) After adding the s, the results I get are: ([EMAIL PROTECTED]) [~/datisstom/bench]$ php -q bench.php Control test (plain/text LIKE %..%):1.383749s Decrypt test (AES_DECRYPT() LIKE %..%): 1.441944s done Nonetheless, I'd still argue that this entire concept is doomed to terrible performance anyway. Regards, Jeremy John Kraal wrote: I put it here: http://pro.datisstom.nl/tests/bench.tar.bz2 The encryption isn't really a *real* security measure, except for when somebody is stupid enough to install phpMyAdmin or anything equivalent and try to get personal data. The problem is the password needs to be anywhere on the application-server and if you're in, you're in. But it's a request and I'm happy to oblige. Even if it only stops them for 1 minute (which could be enough). Regards, John -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - Jeremy Cole wrote: Hi John, Your attachment for the php code got stripped somewhere. Can you post it somewhere (http preferable)? In either case it's going to result in a full table scan, so they are actually both a bad strategy long term, but they should in theory perform as you would expect, with with encryption being slightly slower. Have you tried with longer strings? What is your customer's fear with having the data in plain text? Presumably in order to use this in your application, you will have the AES password stored in your application, and it will end up in logs (such as the slow query log) quite frequently. I would think your data can be safer and your security more effective by setting some policies which are less intrusive into the actual workings of the data, such as encrypting backups and setting system-level policies. Regards, Jeremy John Kraal wrote: Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
I put it here: http://pro.datisstom.nl/tests/bench.tar.bz2 The encryption isn't really a *real* security measure, except for when somebody is stupid enough to install phpMyAdmin or anything equivalent and try to get personal data. The problem is the password needs to be anywhere on the application-server and if you're in, you're in. But it's a request and I'm happy to oblige. Even if it only stops them for 1 minute (which could be enough). Regards, John -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - Jeremy Cole wrote: Hi John, Your attachment for the php code got stripped somewhere. Can you post it somewhere (http preferable)? In either case it's going to result in a full table scan, so they are actually both a bad strategy long term, but they should in theory perform as you would expect, with with encryption being slightly slower. Have you tried with longer strings? What is your customer's fear with having the data in plain text? Presumably in order to use this in your application, you will have the AES password stored in your application, and it will end up in logs (such as the slow query log) quite frequently. I would think your data can be safer and your security more effective by setting some policies which are less intrusive into the actual workings of the data, such as encrypting backups and setting system-level policies. Regards, Jeremy John Kraal wrote: Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Hi John, OK, no conspiracy here. Here is your problem: 25 $qry = sprintf(SELECT id, line FROM `encryptietest` WHERE AES_DECRYPT(`field`, '%') LIKE '%%%s%%', $enckey, $word); You are missing the s in %s for your first string argument, which causes the query to be syntactically incorrect and fail. So your AES test is only testing how quickly you can query with a syntax error. :) After adding the s, the results I get are: ([EMAIL PROTECTED]) [~/datisstom/bench]$ php -q bench.php Control test (plain/text LIKE %..%):1.383749s Decrypt test (AES_DECRYPT() LIKE %..%): 1.441944s done Nonetheless, I'd still argue that this entire concept is doomed to terrible performance anyway. Regards, Jeremy John Kraal wrote: I put it here: http://pro.datisstom.nl/tests/bench.tar.bz2 The encryption isn't really a *real* security measure, except for when somebody is stupid enough to install phpMyAdmin or anything equivalent and try to get personal data. The problem is the password needs to be anywhere on the application-server and if you're in, you're in. But it's a request and I'm happy to oblige. Even if it only stops them for 1 minute (which could be enough). Regards, John -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - Jeremy Cole wrote: Hi John, Your attachment for the php code got stripped somewhere. Can you post it somewhere (http preferable)? In either case it's going to result in a full table scan, so they are actually both a bad strategy long term, but they should in theory perform as you would expect, with with encryption being slightly slower. Have you tried with longer strings? What is your customer's fear with having the data in plain text? Presumably in order to use this in your application, you will have the AES password stored in your application, and it will end up in logs (such as the slow query log) quite frequently. I would think your data can be safer and your security more effective by setting some policies which are less intrusive into the actual workings of the data, such as encrypting backups and setting system-level policies. Regards, Jeremy John Kraal wrote: Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- high performance mysql consulting www.provenscaling.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Have you tried reversing the order of your tests, to see if there is some influence from caching? Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com www.giiexpress.com www.etudes-marche.com -Original Message- From: John Kraal [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 23, 2007 8:51 AM To: mysql@lists.mysql.com Subject: Query performance plain/text versus AES_DECRYPT(): LIKE %..% Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Yes, I did, and shutdowns between the tests and between reversing the tests. -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - Jerry Schwartz wrote: Have you tried reversing the order of your tests, to see if there is some influence from caching? Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com www.giiexpress.com www.etudes-marche.com -Original Message- From: John Kraal [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 23, 2007 8:51 AM To: mysql@lists.mysql.com Subject: Query performance plain/text versus AES_DECRYPT(): LIKE %..% Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- / Humanique / Webstrategie en ontwikkeling / http://www.humanique.com/ - Humanique zoekt een ervaren Web-ontwikkelaar (PHP). Bekijk de vacature op http://www.humanique.com/ - -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance plain/text versus AES_DECRYPT(): LIKE %..%
Hi John, Your attachment for the php code got stripped somewhere. Can you post it somewhere (http preferable)? In either case it's going to result in a full table scan, so they are actually both a bad strategy long term, but they should in theory perform as you would expect, with with encryption being slightly slower. Have you tried with longer strings? What is your customer's fear with having the data in plain text? Presumably in order to use this in your application, you will have the AES password stored in your application, and it will end up in logs (such as the slow query log) quite frequently. I would think your data can be safer and your security more effective by setting some policies which are less intrusive into the actual workings of the data, such as encrypting backups and setting system-level policies. Regards, Jeremy John Kraal wrote: Dear you, I've been working on encrypting some data for a customer. They want their personal/sensitive information encrypted in the database, but they want to be able to search it too, through the application. So we've been thinking a bit, and just started trying and benchmarking some solutions we thought up. The next one really got my attention, I created a table with 4 fields: 1. id (primary/auto_increment, not really interesting) 2. field, with encrypted data 3. md5sum (it has no special use, we benched it though.) 4. line, always containing three words (the same three as encrypted) When we started querying the table for random words (from lipsum.com), it seems that searching in the encrypted fields was _lots_ faster. Results below: 1.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):1.409699s Decrypt test (AES_DECRYPT() LIKE %..%): 1.226069s done 1.000.000 queries per field: ~$ php -q searchtest.php Control test (plain/text LIKE %..%):155.059671s Decrypt test (AES_DECRYPT() LIKE %..%): 137.003216s done Actually, the only thing I could think of to say was: Well, at least it's consistent. I've attached all the files I used for this test. Edit db.inc.php (add some more lipsum if you want), execute fill.php, and then have fun with bench.php. Does any of you know why this is, how come, etc? I'm just very curious. Regards, John Kraal -- high performance mysql consulting www.provenscaling.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance.
Thanks a lot!! :D You were right. There was a bug. Upgrading to mysql 4.1.20 solved my problem. Daniel da Veiga wrote: Check http://bugs.mysql.com/bug.php?id=12915 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance.
On 6/6/06, Eugene Kosov [EMAIL PROTECTED] wrote: Hi, List! I'm a little bit confused with (IMHO) poor query performance. I have a table with 1'000'000 records. Table consists of 2 service fields and a number of data fields. Service fields are status and processor_id (added for concurrent queue processing). The question is why are updates so slow? A query like: UPDATE queue SET status=1 WHERE status=0 LIMIT 1; takes about 5 seconds while this SELECT * FROM queue WHERE status=0 LIMIT 1; takes 0.01-0.02 second. As I can see in process list most of the time query is Searching rows for update what's very strange. I thought UPDATE searches rows the same way SELECT does. Doesn't it? Actually, seems like it does, because if I remove all fields except for id and status, same both queries (SELECT UPDATE) work quite fast. So, why is my update query so slow? What can I do to make it work faster? Can I somehow find out what is the bottleneck here? May be I should increase some buffers or something else? I copied my-huge my.cnf sample from mysql distribution. I'm looking forward for any help because I'm stuck with this and don't know what to do. Thanks in advance to all! P.S.: Some table info: mysql show table status like 'queue'; +--++-++++-+-+--+---++-+-++--+--++---+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation| Checksum | Create_options | Comment | +--++-++++-+-+--+---++-+-++--+--++---+ | queue | InnoDB | 9 | Dynamic| 726423 |159 | 116031488 |NULL | 32555008 | 0 |101 | 2006-06-06 22:01:21 | NULL| NULL | koi8r_general_ci | NULL || InnoDB free: 68608 kB | +--++-++++-+-+--+---++-+-++--+--++---+ mysql show indexes from queue; +--++--+--+--+---+-+--++--++-+ | Table| Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +--++--+--+--+---+-+--++--++-+ | queue | 0 | PRIMARY |1 | id | A | 1170633 | NULL | NULL | | BTREE | | | queue | 1 | status |1 | status | A | 18 | NULL | NULL | | BTREE | | | queue | 1 | processor_id |1 | processor_id | A | 18 | NULL | NULL | YES | BTREE | | +--++--+--+--+---+-+--++--++-+ 3 rows in set (0.01 sec) /etc/my.cnf: ... [mysqld] port= 3306 socket = /tmp/mysql.sock skip-locking key_buffer = 384M max_allowed_packet = 1M table_cache = 512 sort_buffer_size = 2M read_buffer_size = 2M read_rnd_buffer_size = 8M myisam_sort_buffer_size = 64M thread_cache_size = 8 query_cache_size = 32M # Try number of CPU's*2 for thread_concurrency thread_concurrency = 4 ... Check http://bugs.mysql.com/bug.php?id=12915 -- Daniel da Veiga Computer Operator - RS - Brazil -BEGIN GEEK CODE BLOCK- Version: 3.1 GCM/IT/P/O d-? s:- a? C++$ UBLA++ P+ L++ E--- W+++$ N o+ K- w O M- V- PS PE Y PGP- t+ 5 X+++ R+* tv b+ DI+++ D+ G+ e h+ r+ y++ --END GEEK CODE BLOCK-- -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query performance...two table design options
James Tu wrote: Hi: Let's say I want to store the following information. Unique ID - INT(10) autoincrement First Name - VARCHAR (25) Last Name - VARCHAR (25) Age - INT(3) In general 'age' is a bad column, because you need to know what year the data was entered to calculate the current age. It is often better to store year of birth or date of birth. This may not be relevant to your application, I just wanted to mention it. Date - DATETIME Activity - VARCHAR(100) Data - TEXT I would be basing my queries on all columns _except_ the Data column. I.e. I would be using WHERE's with all except the Data column. You are not telling us how much data you are planning to maintain. How big will the Data column be, on average, and how many rows/persons are we talking about? Hundreds, thousands or millions? My question is...which design would perform better? (Design A) Put all in one table...index all the columns that I will use WHERE with. -TABLE_ALL- Unique ID - INT(10) autoincrement First Name - VARCHAR (25) Last Name - VARCHAR (25) Age - INT(3) Date - DATETIME Activity - VARCHAR(100) Data - TEXT Indices - Unique ID, First Name, Last Name, Age, Date, Activity You will probably not need to index all columns. If you have few rows, you don't need indexes at all, except for the primary key on the unique ID. A primary key automatically works as an index. I would probably start with only the primary key, and add indexes only when I find that some queries are too slow. SELECT First_Name, Last_Name, Data FROM TABLE_ALL WHERE Activity = 'draw' AND Age 24; (Design B) Put the Data in its own separate table. -TABLE_A- Unique ID - INT(10) autoincrement First Name - VARCHAR (25) Last Name - VARCHAR (25) Age - INT(3) Date - DATETIME Activity - VARCHAR(100) Data_ID - INT(10) Indices - Unique ID, First Name, Last Name, Age, Date, Activity -TABLE_B- Data_ID - INT(10) Data - TEXT Index - Data_ID This will be faster if your Data column is relatively big (several K on average, I don't know. depends on your HW, of course). I would suggest using the unique ID from TABLE_A as a primary key in TABLE_B, and drop Data_ID from TABLE_A. If there are millions of rows I would normalize these tables to the extreme, something like this: Person: P_Id,Born FName: FN_Id,FirstName LName: LN_Id,LastName FN_P: FN_Id,P_id LN_P: LN_Id,P_id Activity: A_Id,Activity Act_P: A_id,P_Id Data:P_Id,Data FN_P and LN_P are so-called link tables, linking names to persons in a many-to-many relation. Even further normalization would have been achieved with an additional counter column. It would be used in these tables to maintain the order of the names when a person have multiple first names or last names, so that you would have one FName row for each unique name, Mary Jane would be split in Mary and Jane. You could query this schema like this: SELECT FirstName,LastName,Data FROM Person,FName,LName,Data,Activity,FN_P,LN_P,Act_P WHERE Person.P_Id = Data.P_Id AND Person.P_Id = FN_P.P_Id AND Person.P_Id = LN_P.P_Id AND Person.P_Id = Act_P.P_Id AND FName.FN_Id = FN_P.FN_Id AND LName.LN_Id = LN_P.LN_Id AND Activity.A_Id = Act_P.A_Id AND Activity = 'draw' and Born year(now()) - 24 ...or with more explicit formulated joins, like this: SELECT FirstName,LastName,Data FROM Person NATURAL JOIN Act_P NATURAL JOIN Activity INNER JOIN FN_P ON FN_P.P_Id=Person.P_Id NATURAL JOIN FName INNER JOIN LN_P ON LN_P.P_Id=Person.P_Id NATURAL JOIN LName, LEFT JOIN Data ON Data.P_Id = Person.P_Id WHERE Activity = 'draw' and Born year(now()) - 24 The NATURAL JOINS are joins based on columns with the same name in the two joined tables, see the manual. The LEFT JOIN is used in this case because some Persons may not have a corresponding row in the Data table, in this case the Data column of the result table will contain NULL. If you used an INNER join in place of the LEFT join in this case, Persons without a Data record would be omitted from the result. SELECT TABLE_A.First_Name, TABLE_A.Last_Name, TABLE_B.Data FROM TABLE_A, TABLE_B WHERE Activity = 'draw' AND Age 24 AND TABLE_A.Data_ID = TABLE_B.Data_ID; (Aside: Would this query give me the same results as the above query?) Yes, I think so, if all rows in TABLE_A have a corresponding row in TABLE_B. -- Roger -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Query Performance
From: Fernando Henrique Giorgetti [mailto:[EMAIL PROTECTED] Hi Folks! Here, I have the following table: CREATE TABLE `accesses` ( `time` varchar(15) NOT NULL default '', `duration` int(10) default NULL, `user` varchar(25) NOT NULL default '', `ipaddr` varchar(15) NOT NULL default '', `result` varchar(30) default NULL, `bytes` int(10) default NULL, `reqmethod` varchar(10) default NULL, `urlparent` varchar(100) NOT NULL default '', KEY `usuario` (`usuario`), KEY `time_result` (`time`, `result`) ); If my table has a great number of rows (something like 5 millions), the result time is too much longer. select user, count(distinct concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)), ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as duration, sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; PS: explaining this select, the time_result key is a possible_key, but, in the key field I have the NULL value (the NULL persists even if I force with use index()). Can anybody help me what can I do to make this query faster (indexes, tuning, or, change the table structure or the query). Thank you ! The first thing I'd do is index the `result` field, as you're checking against it in the WHERE clause. However, if I remember indexing behavior correctly, that won't help if you only have a a few unique values in that column. Give it a shot, though, I imagine it'd definitely help. If that doesn't drastically improve it, I'd also look into a way around performing the date and time functions in the query. I don't know if that's possible, but depending on what this is feeding to (most likely PHP or Perl), it may be quicker to do those calculations in the wrapping script (if there is one, that is). HTH! -- Mike Johnson Smarter Living, Inc. Web Developerwww.smartertravel.com [EMAIL PROTECTED] (617) 886-5539 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Query Performance
From: Fernando Henrique Giorgetti [mailto:[EMAIL PROTECTED] Hi Folks! Here, I have the following table: CREATE TABLE `accesses` ( `time` varchar(15) NOT NULL default '', `duration` int(10) default NULL, `user` varchar(25) NOT NULL default '', `ipaddr` varchar(15) NOT NULL default '', `result` varchar(30) default NULL, `bytes` int(10) default NULL, `reqmethod` varchar(10) default NULL, `urlparent` varchar(100) NOT NULL default '', KEY `usuario` (`usuario`), KEY `time_result` (`time`, `result`) ); If my table has a great number of rows (something like 5 millions), the result time is too much longer. select user, count(distinct concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)), ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as duration, sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; PS: explaining this select, the time_result key is a possible_key, but, in the key field I have the NULL value (the NULL persists even if I force with use index()). Can anybody help me what can I do to make this query faster (indexes, tuning, or, change the table structure or the query). Thank you ! Oh, I'm sorry. I read your CREATE statement too quickly the first time and didn't notice that the `time_result` index was across both `time` and `result`. In that case, indexing `result` separately may not help at all. Might be worth a shot, though, if you have the disk space and time to play around with it. -- Mike Johnson Smarter Living, Inc. Web Developerwww.smartertravel.com [EMAIL PROTECTED] (617) 886-5539 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Performance
Fernando Henrique Giorgetti [EMAIL PROTECTED] wrote on 04/14/2005 02:34:30 PM: Hi Folks! Here, I have the following table: CREATE TABLE `accesses` ( `time` varchar(15) NOT NULL default '', `duration` int(10) default NULL, `user` varchar(25) NOT NULL default '', `ipaddr` varchar(15) NOT NULL default '', `result` varchar(30) default NULL, `bytes` int(10) default NULL, `reqmethod` varchar(10) default NULL, `urlparent` varchar(100) NOT NULL default '', KEY `usuario` (`usuario`), KEY `time_result` (`time`, `result`) ); If my table has a great number of rows (something like 5 millions), the result time is too much longer. select user, count(distinct concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)), ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as duration, sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; PS: explaining this select, the time_result key is a possible_key, but, in the key field I have the NULL value (the NULL persists even if I force with use index()). Can anybody help me what can I do to make this query faster (indexes, tuning, or, change the table structure or the query). Thank you ! -- Fernando Henrique Giorgetti [EMAIL PROTECTED] Departamento de Tecnologia http://www.gruponet.com.br -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] If I unfold and indent your query I get: select user , count(distinct concat( date_format(from_unixtime(time), %d/%m/%Y) , - , time_format(from_unixtime(time), %H:%i) ) , ipaddr , urlparent ) as qtd , sec_to_time(sum(duration)/1000) as duration , sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; Your COUNT() operator seems to be trying to execute a COUNT((concatenated date to nearest minute), ipaddr, urlparent). I may have unfolded it incorrectly but that's how it seems to me. I think you meant to put the ipaddr and urlparent fields INTO the CONCAT() but I am just working from what I got. There is a faster way to compute time to the nearest minute than what you are doing with the string conversions. Just do an integer division of your TIME value by 60 and throw away the remainder like this: time DIV 60 or like this: FLOOR(time/60) (http://dev.mysql.com/doc/mysql/en/arithmetic-functions.html) If I understand your COUNT(DISTINCT ) statement correctly, you want to know how in how many different minutes the user used either a unique ipaddr or a different urlparent. Am I close? You can also compute as the OR of two ranges (which may end up using the index or it may not...) So this could be a valid revision of your original query: select user , count(distinct concat( FLOOR(time/60) , ipaddr , urlparent ) ) as qtd , sec_to_time(sum(duration)/1000) as duration , sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and ( result 'TCP_DENIED/403' OR result 'TCP_DENIED/403' ) group by user; Note: GROUP BY includes a free ORDER BY unless you specify otherwise. HTH, Shawn Green Database Administrator Unimin Corporation - Spruce Pine
Re: Query Performance
You could probably save a bit of processing time by changing: concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)) to: date_format(from_unixtime(time), %d/%m/%Y - %H:%i) This would mean half the date conversions would be executed. Separating out the 'time' and 'result' indicies will probably help too. Cheers, Andrew On 14/4/05 6:34 pm, Fernando Henrique Giorgetti [EMAIL PROTECTED] wrote: Hi Folks! Here, I have the following table: CREATE TABLE `accesses` ( `time` varchar(15) NOT NULL default '', `duration` int(10) default NULL, `user` varchar(25) NOT NULL default '', `ipaddr` varchar(15) NOT NULL default '', `result` varchar(30) default NULL, `bytes` int(10) default NULL, `reqmethod` varchar(10) default NULL, `urlparent` varchar(100) NOT NULL default '', KEY `usuario` (`usuario`), KEY `time_result` (`time`, `result`) ); If my table has a great number of rows (something like 5 millions), the result time is too much longer. select user, count(distinct concat(date_format(from_unixtime(time), %d/%m/%Y), - , time_format(from_unixtime(time), %H:%i)), ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as duration, sum(bytes) as bytes from acessos where time = 1109646000 and time = 1112324399 and result TCP_DENIED/403 group by user order by user; PS: explaining this select, the time_result key is a possible_key, but, in the key field I have the NULL value (the NULL persists even if I force with use index()). Can anybody help me what can I do to make this query faster (indexes, tuning, or, change the table structure or the query). Thank you ! -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: query performance
If you do I suggest you also include relevant table definitions and possibly a little sample data (plus an indication of total table sizes) and expected output, this will greatly assist anyone who my be able to help. Oh yes, and don't forget to state the version of MySQL you are running. Coz On Wed, 16 Feb 2005 18:22:11 -0700, Ryan McCullough [EMAIL PROTECTED] wrote: Can I post a query to this list and ask for help optimizing it? -- Ryan McCullough mailto:[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- CozWeb Solutions Ltd http://www.cozweb.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: query performance
At 07:10 PM 1/23/2004, Larry Brown wrote: I have a db that had some 20,000 records or so in it. I have a query to find out how many jobs have been input during the current day. To add them I ran the following query... select count(idnumber) from maintable where inputdatetime '$date 00:00:00' and client='smith' $date is the current date in CCYY-MM-DD fashion and the query runs. However it seems fairly slow. I have now added some 100,000+ records from a merge I performed and now it takes a really long time. Is there a better way to query this that will take a load off the machine? The only key in the table is the idnumber. I don't really know anything about how keys help or when to/not to use them other than their being a necessity for an auto_increment field. TIA Larry Larry, Add two indexes, one for InputDateTime and another for Client. You should read up on MySQL. Try Paul Dubois book MySQL 2nd Edition because starts off really easy with stuff like this and by the time you're done, you're an expert. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Query performance
2 index on this table: - one unique index on user_id and att_id (pk) - one index on att_id and user_id. I need to have the following query: select value from user_att where att_id = ? and value like '?' (no wildcard) 1. when I do a explain, this query use the second index. But, if I change my second index to att_id and value, will the performance improve? You could add it as a third index and see which works better, but on 18 million rows that's going to probably take quite a bit of time. From looking at your query it seems like it would be a better index than the current one. 2. what is the difference if I change the query to select value from user_att where att_id = ? and lower(value) = lower('?') will this query slower? I could be wrong, but I believe the query won't use the index if you use lower(). Run an explain on this query and see. 3. when compare string, is mysql sql case sensitive? It seems that it is case in-sensitive. If case in-sensitive, the following query will be faster? select value from user_att where att_id = ? and value = '?' Mysql is only case sensitive on binary and blob fields. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]