On Mon, Aug 20, 2001 at 11:38:57PM -0400, Network wrote:
> The problem= The 'addy1' column or field (in attdata) is tainted
> with incorrect data entries ... e.g. 'addy1' should only have
> addresses stored however, some bright person(s) have managed to
> input company names in the address/addy1 field. I'm presented the
> task of fixing this data before incorporating it into my *mysql
> database*.
Ah, no validation before the data went in. That sucks. :-(
> I have chosen the following method in an attempt to get the above done.
> In my 'attdata' table I have the following fields:
>
> id INT(9) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
> comp VARCHAR(70) NULL,
> addy1 VARCHAR(50) NULL,
>
> In my 'biz' table I have these fields:
>
> cid INT(9) NULL,
> bname VARCHAR(70) NULL,
> baddy VARCHAR(25) NULL,
>
> populating biz.cid -and- biz.bname *from* attdata.id -and-
> attdata.comp using the following syntax works; insert into biz (cid,
> bname) select id, comp from attdata; <-- no need for selective
> insert; data in these fields are *not* tainted. (the above worked
> like as described)
>
> Now the immediate problem. This third field attdata.addy1 has the
> wrong data entered in in thousands of rows. e.g this is an address
> field, someone entered company names in this field on certain rows.
> I resort to using regexp within a where-clause to facilitate
> selectively extracting from attdata.addy1 then, inserting/replacing
> those entries/addresses beginning with 0-9 into the biz.baddy
> column. Below is the syntax I use and the results:
Sounds reasonable so far.
> mysql> insert into biz (baddy) select addy1 from attdata where addy1 regexp
> "^[0-9]";
> Query OK, 173012 rows affected (23.70 sec)
> Records: 173012 Duplicates: 0 Warnings: 0
>
> Everything looks ok from this output but, when I query the
> biz.baddy... nothing at all was inserted or replaced in to the
> biz.baddy column for any rows.
Hmm. I wonder if that's supposed to work as you expect it to. My gut
feeling is that it can't, and here's one reason why:
SQL databases never guarantee the order of the results returned unless
you specify an ORDER BY clause in your query. So when you run the
query to pull the addy1 data, there's no telling what order it will
end up in. And if you look at the manual:
http://www.mysql.com/doc/I/N/INSERT_SELECT.html
You'll see that the SELECT part of an INSERT ... SELECT query can NOT
contain an ORDER BY clause.
Besides that theoretical stuff, though, the real problem is that
INSERT is for inserting new records into a database table. What
you're trying to do is UPDATE existing records with new data. And
there's no INSERT ... SELECT counterpart in the UPDATE syntax.
What you probably need to do is write a program in some language
(Perl, Python, C/C++) that can do some of the work for you. It's
probably not going to be easy just using SQL statements.
You might be able to get part of the way there by copying the good
data into one table, the bad data into another, and merging the
results, but it's not the way I'd approach the problem.
> I'll not trust this task to anyone other than myself.
Then why are you asking us? :-)
Hope this helps in some way...
Jeremy
--
Jeremy D. Zawodny, <[EMAIL PROTECTED]>
Technical Yahoo - Yahoo Finance
Desk: (408) 349-7878 Fax: (408) 349-5454 Cell: (408) 685-5936
MySQL 3.23.41-max: up 4 days, processed 60,117,446 queries (139/sec. avg)
---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php