question on loading data and generating uniq table
Hi, I wanted to ask some more experienced mysql users to give me some advice on a project I am currently planning. I have a text file with three columns: strName(char6), position(integer), str(char36) This file has some 3 billion rows (3,000,000,000). There are some strs that are duplicated and eventually I want to create two tables: uniqStr and posIDX, where uniqStr has the columns ID (integer,primary) and str (uniq,char36,index) posIDX has the columns uniqStr_ID, strName,position (maybe the strName can be moved to a separate table as well and just referenced in posIDX) Now, what I would do is load the text file into a table; index on str; create the table uniqStr using a select statement; create the table posIDX by joining the first and second table. I am not sure this is the fastest way of doing things. Maybe creating the uniq sequences first using e.g. uniq on the command line would be faster? (Thereby skipping loading the first file and creating the index) The str where created using a sliding window on a few very long strings. In the very end I want to search for millions of new strs and figure out if and in which string and at which position they are located. (I am looking for exact matches) Maybe this is not even a database problem, but could be solved easier with different tools? Thanks for any advice/comment. Bernd
The = operator
Hi, I have a query where I want to retrieve all the people who are not in a specific group. Like so: SELECT id, name FROM people WHERE group_id != 3; Since people may not be in a group at all, I also need to test if the column is NULL: SELECT id, name FROM people WHERE group_id != 3 OR group_id IS NULL; Running that through EXPLAIN things look fine, but if I instead use: SELECT id, name FROM people WHERE NOT group_id = 3; I get a full table scan. Is that because is = equivalent to a function in a sense? Thanks. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: The = operator
I'm sorry for the rant, but nulls in databases make me see red! NULLS ARE GARBAGE! You are using a 'NULL-safe equal to operator', meaning, WHERE NOT group_id = 3, and oh, by the way, I DON'T WANT ANY NULLS TO QUALIFY AS 3. And yes this will be much slower, because now we can't use the index on group_id. Null is not an initial value, it is really saying that nothing has ever been put into this field, item, or what-ever. It is uninitialized and no one can say exactly what is in it, this is why it is marked as null. Using the null-safe equal to operator seems wrong, of course too me having any nulls in your data is wrong, or just plain dirty data, and I wouldn't put any faith into the results from a database that contains nulls, because the output is unpredictable if it is not very carefully coded. You (everyone in the SQL world) would be better off using the IS NULL and IS NOT NULL operators and the IFNULL() function to find all null values in their data, and initialize them to the appropriate initial value. This may require discussions with the application designers or project managers. Whom ever is responsible for these null values being your data all the frecking time? It is people that don't really understand what a null value is, and who also have the responsibility of designing a database application. So people like me who write the code need to add 'IS NOT NULL' to every conditional statement in our logic. What you want to do noted in your email below should be very simple, but only if you have CLEAN DATA. I would clean or what we call scrub your data first, then you can execute a simple and very fast select statement. Use a temporary table if you need to, just don't try to process dirty data, the end results will only make you look bad to anyone looking at the results. My $0.02, Mike. Morten wrote: Hi, I have a query where I want to retrieve all the people who are not in a specific group. Like so: SELECT id, name FROM people WHERE group_id != 3; Since people may not be in a group at all, I also need to test if the column is NULL: SELECT id, name FROM people WHERE group_id != 3 OR group_id IS NULL; Running that through EXPLAIN things look fine, but if I instead use: SELECT id, name FROM people WHERE NOT group_id = 3; I get a full table scan. Is that because is = equivalent to a function in a sense? Thanks. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Using or not using index
Running 4.1.22-standard, I have two simple MyISAM tables: Table: temp_del_ids Create Table: CREATE TABLE `temp_del_ids` ( `cust_id` int(11) NOT NULL default '0', PRIMARY KEY (`cust_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 and Table: cust_campaigns Create Table: CREATE TABLE `cust_campaigns` ( `cust_camp_id` int(11) NOT NULL auto_increment, `customer_id` int(11) NOT NULL default '0', `campaign_date` date NOT NULL default '-00-00', `campaign_name` char(255) default NULL, PRIMARY KEY (`cust_camp_id`), KEY `customer_id` (`customer_id`), KEY `campaign_date` (`campaign_date`), KEY `campaign_name` (`campaign_name`) ) ENGINE=MyISAM AUTO_INCREMENT=1415388 DEFAULT CHARSET=utf8 `temp_del_ids` has 81 records; `cust_campaigns` has 1052796. Here's my puzzlement. I checked out two queries, and I don't understand why one of them is using an index only on my small table, and the other is using indexes on both. giiexpress.com explain select * from cust_campaigns join temp_del_ids - on temp_del_ids.cust_id = cust_campaigns.customer_id\G *** 1. row *** id: 1 select_type: SIMPLE table: temp_del_ids type: index possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 81 Extra: Using index *** 2. row *** id: 1 select_type: SIMPLE table: cust_campaigns type: ref possible_keys: customer_id key: customer_id key_len: 4 ref: giiexpr_customers.temp_del_ids.cust_id rows: 3 Extra: === giiexpress.com explain - select cust_campaigns.customer_id from - cust_campaigns join temp_del_ids - on temp_del_ids.cust_id = cust_campaigns.customer_id\G *** 1. row *** id: 1 select_type: SIMPLE table: temp_del_ids type: index possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 81 Extra: Using index *** 2. row *** id: 1 select_type: SIMPLE table: cust_campaigns type: ref possible_keys: customer_id key: customer_id key_len: 4 ref: giiexpr_customers.temp_del_ids.cust_id rows: 3 Extra: Using index == Am I right to be surprised, or am I just misinterpreting what's going on? Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com www.giiexpress.com www.etudes-marche.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
avoiding use of Nulls (was: The = operator)
On Friday 13 March 2009 09:48:36 Michael wrote: I'm sorry for the rant, but nulls in databases make me see red! NULLS ARE GARBAGE! You are using a 'NULL-safe equal to operator', meaning, WHERE NOT group_id = 3, and oh, by the way, I DON'T WANT ANY NULLS TO QUALIFY AS 3. And yes this will be much slower, because now we can't use the index on group_id. Null is not an initial value, it is really saying that nothing has ever been put into this field, item, or what-ever. It is uninitialized and no one can say exactly what is in it, this is why it is marked as null. Using the null-safe equal to operator seems wrong, of course too me having any nulls in your data is wrong, or just plain dirty data, and I wouldn't put any faith into the results from a database that contains nulls, because the output is unpredictable if it is not very carefully coded. You (everyone in the SQL world) would be better off using the IS NULL and IS NOT NULL operators and the IFNULL() function to find all null values in their data, and initialize them to the appropriate initial value. This may require discussions with the application designers or project managers. Whom ever is responsible for these null values being your data all the frecking time? It is people that don't really understand what a null value is, and who also have the responsibility of designing a database application. So people like me who write the code need to add 'IS NOT NULL' to every conditional statement in our logic. What you want to do noted in your email below should be very simple, but only if you have CLEAN DATA. I would clean or what we call scrub your data first, then you can execute a simple and very fast select statement. Use a temporary table if you need to, just don't try to process dirty data, the end results will only make you look bad to anyone looking at the results. My $0.02, Mike. This is news to me, and I'd like to understand, so please don't take this as a flame or anything. I think I understand this part: When designing the database structure, you don't want just one table with a million columns, most of which are empty for any given row. What you do is break it up into several tables, with a relationship between the tables. Thus the term Relational database management system (RDBMS) theoretically you break up your data into tables so that there is never a non- applicable field for any given row. In the real world, this can lead to a lot of complexity, so many people usually cheat and have a few does not apply fields for some records. Now, what I've been doing is using NULL for does not apply as it seems to be more true to the data model than using some sort of fake data. I freely admit that Nulls increase the complexity of the code, but isn't it justified? I would be interested in your explanation. Ray -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: avoiding use of Nulls (was: The = operator)
On Friday 13 March 2009 09:48:36 Michael wrote: I'm sorry for the rant, but nulls in databases make me see red! NULLS ARE GARBAGE! You are using a 'NULL-safe equal to operator', meaning, WHERE NOT group_id = 3, and oh, by the way, I DON'T WANT ANY NULLS TO QUALIFY AS 3. And yes this will be much slower, because now we can't use the index on group_id. Null is not an initial value, it is really saying that nothing has ever been put into this field, item, or what-ever. It is uninitialized and no one can say exactly what is in it, this is why it is marked as null. Using the null-safe equal to operator seems wrong, of course too me having any nulls in your data is wrong, or just plain dirty data, and I wouldn't put any faith into the results from a database that contains nulls, because the output is unpredictable if it is not very carefully coded. You (everyone in the SQL world) would be better off using the IS NULL and IS NOT NULL operators and the IFNULL() function to find all null values in their data, and initialize them to the appropriate initial value. This may require discussions with the application designers or project managers. Whom ever is responsible for these null values being your data all the frecking time? It is people that don't really understand what a null value is, and who also have the responsibility of designing a database application. So people like me who write the code need to add 'IS NOT NULL' to every conditional statement in our logic. What you want to do noted in your email below should be very simple, but only if you have CLEAN DATA. I would clean or what we call scrub your data first, then you can execute a simple and very fast select statement. Use a temporary table if you need to, just don't try to process dirty data, the end results will only make you look bad to anyone looking at the results. My $0.02, Mike. This is news to me, and I'd like to understand, so please don't take this as a flame or anything. I think I understand this part: When designing the database structure, you don't want just one table with a million columns, most of which are empty for any given row. What you do is break it up into several tables, with a relationship between the tables. Thus the term Relational database management system (RDBMS) theoretically you break up your data into tables so that there is never a non- applicable field for any given row. In the real world, this can lead to a lot of complexity, so many people usually cheat and have a few does not apply fields for some records. Now, what I've been doing is using NULL for does not apply as it seems to be more true to the data model than using some sort of fake data. I freely admit that Nulls increase the complexity of the code, but isn't it justified? I would be interested in your explanation. Ray I agree with most everything you said, as far as the use of a RDBMS, and especially: quoting you, theoretically you break up your data into tables so that there is never a non-applicable field for any given row. Here is where I disagree with you, and please DON'T take this as an insult (we're just haggling over good/bad practices ), but I think it is lazy database design when you say: In the real world, this can lead to a lot of complexity, so many people usually cheat and have a few does not apply fields for some records. OK! Now here is my top 5 explanations why using Nulls is a bad idea: Explanation(5): The more you understand how the database is to be used, and the more complexity and thought you put into your database design, the less complex it will be to retrieve reliable information out of it. Furthermore, (and this is probably what makes me crazy when Nulls are evolved) after a ten year stretch of software development, where I and a team designed our own databases, I did a nine year stretch of statistical programming, using databases designed by other people, and Nulls in the data made the results unpredictable, and yeah, made me crazy! I had to write nightly processes to resolve inconsistencies in the data, if at least report inconsistencies. You know the old saying Garbage in = Garbage out, to me Nulls are garbage, and if there is a good reason for nulls to be a part of good clean data then someone please help me understand that. Explanation(4):I think that the brilliant computer scientists who developed the RDBMS database model, didn't want to support nulls initially, but they had to because null is a form of data, it means We don't know what this is, and so it must be stored knowing that Null should NOT be used as valid information. Explanation(3):I think it was back in the sixties at Berkley, when some smart guy on LSD found a use for the Null value, and now this use of Null is taught in Universities all over the world, and considered to be a blessed method to practice. Explanation(2): Using a Null column, and you have an index on that
Re: Select query locks tables in Innodb
2009/3/12 Carl c...@etrak-plus.com: I am still a little puzzled about how we could have a relatively large set of records (100,000+) and yet not cause any table to be locked as the server has only 8GB of memory. What's the relationship you're implying between memory and locking? Multi-version concurrency doesn't necessarily mean the older versions that are being read from have to be entirely in memory. InnoDB will lock on a query that doesn't use an index. It shouldn't lock on a SELECT query, regardless of the indexes involved. - Perrin -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: avoiding use of Nulls (was: The = operator)
On Fri, 13 Mar 2009, mich...@j3ksolutions.com wrote: Explanation(5): The more you understand how the database is to be used, and the more complexity and thought you put into your database design, the less complex it will be to retrieve reliable information out of it. Furthermore, (and this is probably what makes me crazy when Nulls are evolved) after a ten year stretch of software development, where I and a team designed our own databases, I did a nine year stretch of statistical programming, using databases designed by other people, and Nulls in the data made the results unpredictable, and yeah, made me crazy! I had to write nightly processes to resolve inconsistencies in the data, if at least report inconsistencies. You know the old saying Garbage in = Garbage out, to me Nulls are garbage, and if there is a good reason for nulls to be a part of good clean data then someone please help me understand that. Hi I'm in a argumentative mood today too. :-) I have a database logging weather data. When a station does not report a temperature, it is set to NULL. It would be a very bad idea to set it to 0 as this would ruin the whole statistics. NULL is a perfectly valid information in many cases. Cheers Thomas -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: avoiding use of Nulls (was: The = operator)
On Fri, 13 Mar 2009, mich...@j3ksolutions.com wrote: Explanation(5): The more you understand how the database is to be used, and the more complexity and thought you put into your database design, the less complex it will be to retrieve reliable information out of it. Furthermore, (and this is probably what makes me crazy when Nulls are evolved) after a ten year stretch of software development, where I and a team designed our own databases, I did a nine year stretch of statistical programming, using databases designed by other people, and Nulls in the data made the results unpredictable, and yeah, made me crazy! I had to write nightly processes to resolve inconsistencies in the data, if at least report inconsistencies. You know the old saying Garbage in = Garbage out, to me Nulls are garbage, and if there is a good reason for nulls to be a part of good clean data then someone please help me understand that. Hi I'm in a argumentative mood today too. :-) I have a database logging weather data. When a station does not report a temperature, it is set to NULL. It would be a very bad idea to set it to 0 as this would ruin the whole statistics. NULL is a perfectly valid information in many cases. Cheers Thomas OK! I do understand, thank you. But hypothetically speaking, what value would you use if you didn't have a I don't what this is value like null? I ask this because I started programming when NULL was really zero, and part of the ASCII collating sequence. I'd use -9., I'd never allow a i don't know what it is value like Null in my database. Mike. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Getting single results per (left) record with INNER JOIN
I'm hoping someone can point me in the right direction for what I need, to save me trawling through books and Google when I don't know what I'm looking for. I'm using an INNER JOIN to query a table that has a one-to-many relationship with the table in my FROM clause, but I only want one result in the result set per record in the table in my FROM clause, rather than multiple results if there are multiple matching records in the JOINed table. I'm just not sure how to do that (or rather can't remember!). My query is: SELECT `People`.`person_id`, `People`.`name`, FROM `People` INNER JOIN `Person_postal_addresses` ON `Person_postal_addresses`.`person_id` = `People`.`person_id` WHERE `People`.`name` REGEXP 'example' OR `Person_postal_addresses`.`address` REGEXP 'example' ; How do I alter that to get one result per matching row in the People table? When there are multiple matches, for one row in People, for: `Person_postal_addresses`.`address` REGEXP '1' Thanks in advance, Nigel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: avoiding use of Nulls
ggghh mich...@j3ksolutions.com wrote: On Fri, 13 Mar 2009, mich...@j3ksolutions.com wrote: Explanation(5): The more you understand how the database is to be used, and the more complexity and thought you put into your database design, the less complex it will be to retrieve reliable information out of it. Furthermore, (and this is probably what makes me crazy when Nulls are evolved) after a ten year stretch of software development, where I and a team designed our own databases, I did a nine year stretch of statistical programming, using databases designed by other people, and Nulls in the data made the results unpredictable, and yeah, made me crazy! I had to write nightly processes to resolve inconsistencies in the data, if at least report inconsistencies. You know the old saying Garbage in = Garbage out, to me Nulls are garbage, and if there is a good reason for nulls to be a part of good clean data then someone please help me understand that. Hi I'm in a argumentative mood today too. :-) I have a database logging weather data. When a station does not report a temperature, it is set to NULL. It would be a very bad idea to set it to 0 as this would ruin the whole statistics. NULL is a perfectly valid information in many cases. Cheers Thomas OK! I do understand, thank you. But hypothetically speaking, what value would you use if you didn't have a I don't what this is value like null? I ask this because I started programming when NULL was really zero, and part of the ASCII collating sequence. I'd use -9., I'd never allow a i don't know what it is value like Null in my database. Mike. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Getting single results per (left) record with INNER JOIN
Nigel Peck wrote: ... My query is: SELECT `People`.`person_id`, `People`.`name`, FROM `People` INNER JOIN `Person_postal_addresses` ON `Person_postal_addresses`.`person_id` = `People`.`person_id` WHERE `People`.`name` REGEXP 'example' OR `Person_postal_addresses`.`address` REGEXP 'example' ; Sorry, there was a typo in my query, should have been: SELECT `People`.`person_id`, `People`.`name` FROM `People` INNER JOIN `Person_postal_addresses` ON `Person_postal_addresses`.`person_id` = `People`.`person_id` WHERE `People`.`name` REGEXP 'example' OR `Person_postal_addresses`.`address` REGEXP 'example' ; -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: avoiding use of Nulls
mich...@j3ksolutions.com wrote: On Fri, 13 Mar 2009, mich...@j3ksolutions.com wrote: Explanation(5): The more you understand how the database is to be used, and the more complexity and thought you put into your database design, the less complex it will be to retrieve reliable information out of it. Furthermore, (and this is probably what makes me crazy when Nulls are evolved) after a ten year stretch of software development, where I and a team designed our own databases, I did a nine year stretch of statistical programming, using databases designed by other people, and Nulls in the data made the results unpredictable, and yeah, made me crazy! I had to write nightly processes to resolve inconsistencies in the data, if at least report inconsistencies. You know the old saying Garbage in = Garbage out, to me Nulls are garbage, and if there is a good reason for nulls to be a part of good clean data then someone please help me understand that. Hi I'm in a argumentative mood today too. :-) I have a database logging weather data. When a station does not report a temperature, it is set to NULL. It would be a very bad idea to set it to 0 as this would ruin the whole statistics. NULL is a perfectly valid information in many cases. Cheers Thomas OK! I do understand, thank you. But hypothetically speaking, what value would you use if you didn't have a I don't what this is value like null? I ask this because I started programming when NULL was really zero, and part of the ASCII collating sequence. I'd use -9., I'd never allow a i don't know what it is value like Null in my database. Mike. Somewhere out there, Achilles is gaining on the turtle -- unheralded genius: A clean desk is the sign of a dull mind. - Phil Jourdan --- p...@ptahhotep.com http://www.ptahhotep.com http://www.chiccantine.com/andypantry.php -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Fwd: avoiding use of Nulls
Exactly the point. Michael, NULL *is* information. It means unknown and that is in itself useful information. A common example: A new employee is hired but which department she will work in is unknown. So the data entry person enters all the known information and leaves the rest until it has been clarified. The alternative is even worse than the so-called NULL problem and the alleged difficulty of querying against NULLable columns: in a case like that described above, the only way to handle it is to create a fake row in the foreign-key table, for Department Zero or somesuch. That immediately turns every query into a more complex beast that it would otherwise have been. Every single query must exclude this zeroth row; join a few tables all exhibiting this problem and things get really crazy. Not to mention the fact that these zeroth rows falsify reality and combine fiction with fact. Even worse, the -9 approach means that you can't impose a constraint on the column (such as must be a positive integer. And finally, I cannot believe that you really mean no NULLS ever. Surely you mean only FKs. Otherwise, how would you handle fax numbers for people with no fax, or middle names for people with none? Arthur
Re: avoiding use of Nulls
On Fri, Mar 13, 2009 at 3:20 PM, Andy Wallace awall...@cisdata.net wrote: ggghh mich...@j3ksolutions.com wrote: On Fri, 13 Mar 2009, mich...@j3ksolutions.com wrote: Explanation(5): The more you understand how the database is to be used, and the more complexity and thought you put into your database design, the less complex it will be to retrieve reliable information out of it. Furthermore, (and this is probably what makes me crazy when Nulls are evolved) after a ten year stretch of software development, where I and a team designed our own databases, I did a nine year stretch of statistical programming, using databases designed by other people, and Nulls in the data made the results unpredictable, and yeah, made me crazy! I had to write nightly processes to resolve inconsistencies in the data, if at least report inconsistencies. You know the old saying Garbage in = Garbage out, to me Nulls are garbage, and if there is a good reason for nulls to be a part of good clean data then someone please help me understand that. Hi I'm in a argumentative mood today too. :-) I have a database logging weather data. When a station does not report a temperature, it is set to NULL. It would be a very bad idea to set it to 0 as this would ruin the whole statistics. NULL is a perfectly valid information in many cases. Cheers Thomas OK! I do understand, thank you. But hypothetically speaking, what value would you use if you didn't have a I don't what this is value like null? I ask this because I started programming when NULL was really zero, and part of the ASCII collating sequence. I'd use -9., I'd never allow a i don't know what it is value like Null in my database. Mike. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=fuller.art...@gmail.com