On 23/03/2006 10:07 PM, bussiere bussiere wrote: > hi i'am making a program for formatting string, > or > i've added : > #!/usr/bin/python > # -*- coding: utf-8 -*- > > in the begining of my script but > > str = str.replace('Ç', 'C') > str = str.replace('é', 'E') > str = str.replace('É', 'E') > str = str.replace('è', 'E') > str = str.replace('È', 'E') > str = str.replace('ê', 'E') > > > doesn't work it put me " and , instead of remplacing é by E > > > if someone have an idea it could be great
Hi, I've added some comments below ... I hope they help. Cheers, John > > regards > Bussiere > ps : i've added the whole script under : > __________________________________________________________________________ [snip] > > if ligneA != "": > str = ligneA > str = str.replace('a', 'A') [snip] > str = str.replace('z', 'Z') > > str = str.replace('ç', 'C') > str = str.replace('Ç', 'C') > str = str.replace('é', 'E') > str = str.replace('É', 'E') > str = str.replace('è', 'E') [snip] > str = str.replace('Ú','U') You can replace ALL of this upshifting and accent removal in one blow by using the string translate() method with a suitable table. > str = str.replace(' ', ' ') > str = str.replace(' ', ' ') > str = str.replace(' ', ' ') The standard Python idiom for normalising whitespace is strg = ' '.join(strg.split()) >>> strg = ' ALLO BUSSIERE\tCA VA? ' >>> strg.split() ['ALLO', 'BUSSIERE', 'CA', 'VA?'] >>> ' '.join(strg.split()) 'ALLO BUSSIERE CA VA?' >>> [snip] > if normalisation2 == "O": > str = str.replace('MONSIEUR', 'M') > str = str.replace('MR', 'M') You need to be very careful with this approach. You are changing EVERY occurrence of "MR" in the string, not just where it is a whole "word" meaning "Monsieur". Copnstructed example of what can go wrong: >>> strg = 'MR IMRE NAGY, 123 PRIMROSE STREET, SHAMROCK VALLEY' >>> strg.replace('MR', 'M') 'M IME NAGY, 123 PRIMOSE STREET, SHAMOCK VALLEY' >>> A real, non-constructed history lesson: A certain database indicated duplicate records by having the annotation "DUP" in the surname field e.g. "SMITH DUP". Fortunately it was detected in testing that the so-called clean-up was causing DUPLESSIS to become PLESSIS and DUPRAT to become RAT! Two points here: (1) Split up your strings into "words" or "tokens". Using strg.split() is a start but you may need something more sophisticated e.g. "-" as an additional token separator. (2) Instead of writing out all those lines of code, consider putting those substitutions in a dictionary: title_substitution = { 'MONSIEUR': 'M', 'MR': 'M', 'MADAME': 'MME', # etc } Next level of improvement is to read that stuff from a file. [snip] > > if normalisation4 == "O": > str = str.replace(';\"', ' ') > str = str.replace('\"', ' ') > str = str.replace('\'', ' ') > str = str.replace('-', ' ') > str = str.replace(',', ' ') > str = str.replace('\\', ' ') > str = str.replace('\/', ' ') > str = str.replace('&', ' ') [snip] Again, consider the string translate() method. Also, consider that some of those characters may have some meaning that you perhaps shouldn't blow away e.g. compare 'SMITH & WESSON' with 'SMITH ET WESSON' :-) -- http://mail.python.org/mailman/listinfo/python-list