Benjamin, any chance you can pre-process the data with an external script prior to loading into your database?
Doing this sort of manipulation in SQL may be possible, but it surely will be tricky. RegEx support in MySQL is present, but in my (limited) experience with it, it's really only good for determining whether a string matches a pattern, not manipulating strings, which is what you need to do. An alternative would be to filter in your reporting language, if you're using something like perl, ruby, PHP etc to display the data. Problem would be that performing calculations would also have to be external - no use of SUM etc in SQL since these aren't currently numbers. In either case, you're on to the same approach I would take, I think. I'd use a series of regex's, like so # remove all characters that are not digits commas periods s/[^\d,\.]//g # replace all commas with periods s/,/\./g # remove periods not used to indicate decimal place, looking for .ddd pattern # will break if you have 3 digit (or more) decimal precision! 1 while s/^(.*)\.(\d{3})(.*)/$1$2$3/ Now you should have a number that looks like one of the following: 1000000.00 1000000 1 1.00 1000.00 1000 and is therefore suitable for storage in a numeric column in MySQL. Hopefully all your currency units are the same! Dan On 10/10/06, Benjamin Bittner <[EMAIL PROTECTED]> wrote:
Hi list subscribers, i am inserting millions of product rows from csv files via LOAD DATA INFILE. Every product has a price, but this "price-strings" vary heavily. The main difference between them, is the decimal format. Sometimes it is european like this: 1.000.000,00 sometimes its american like this: 1,000,000.00 so some examples for these strings are: EUR 1,00 (meaning decimal 1.00) 1.00 € (meaning decimal 1.00) 1.000,00 EUR (meaning decimal 1000.00) EUR 1.000,00 (meaning decimal 1000.00) 1.000 EUR (meaning decimal 1000.00) 1,000 EUR (meaning decimal 1000.00) and now, i want to filter/validate that directly within the query. I think some RegEx could do the trick, but this is to much for me. Ive searched for some RegEx for validating decimals, but they allways use just one notation of a decimal. I think a good strategy for that, to filter all chars but [0-9\.,], and than to do some logic like to check how man chars after the last dot or comma (if its two you know its something like 0.00). No my problem is, i don't know where to start. Maybe with a stored procedure or something like that? I don't wanna use another language for this, because i would have to do some comprehensive update work then (selecting every row, checking the price, updating the price), and these rows get written every day, so i would have to do these updates once a day on a couple of million rows. Anyone get me in the right direction? Thanks in advance Regards Benjamin Bittner --------------------------------- Was ist Glück? Schlafen Fische überhaupt? Die Antworten gibt's auf Yahoo! Clever.
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]