On 2022-06-08, Dave <d...@looktowindward.com> wrote: > I misunderstood how it worked, basically I’ve added this function: > > def filterCommonCharacters(theString): > myNewString = theString.replace("\u2019", "'") > return myNewString
> Which returns a new string replacing the common characters. > > This can easily be extended to include other characters as and when > they come up by adding a line as so: > > myNewString = theString.replace("\u2014", “]” #just an example > > Which is what I was trying to achieve. Here's a head-start on some characters you might want to translate, mostly spaces, hyphens, quotation marks, and ligatures: def unicode_translate(s): return s.translate({ 8192: ' ', 8193: ' ', 8194: ' ', 8195: ' ', 8196: ' ', 8197: ' ', 198: 'AE', 8199: ' ', 8200: ' ', 8201: ' ', 8202: ' ', 8203: '', 64258: 'fl', 8208: '-', 8209: '-', 8210: '-', 8211: '-', 8212: '-', 8722: '-', 8216: "'", 8217: "'", 8220: '"', 8221: '"', 64256: 'ff', 160: ' ', 64260: 'ffl', 8198: ' ', 230: 'ae', 12288: ' ', 173: '', 497: 'DZ', 498: 'Dz', 499: 'dz', 64259: 'ffi', 8230: '...', 64257: 'fi', 64262: 'st'}) If you want to go further then the Unidecode package might be helpful: https://pypi.org/project/Unidecode/ -- https://mail.python.org/mailman/listinfo/python-list