Hello Benoit, [...] > However in the example attached, we have XML entities both in the > title and in the abstract. The abstract seems to be correctly > unescaped but the title remains escaped leading to some bad results > (Title is > "Электронна > яjтеория м ..."). > > I don't seem to be able to find the cause of this weird > behavior. Maybe one of you can?
Maybe this record was imported from somewhere? I've had similar cases from sources with a mix of different encodings. Recode is your friend. I've included this step in my problematic workflow: $ recode --diacritics html..utf8 <%s >%s' In your case, it produces this Cyrillic result: Электронна яjтеория м ... Hope it helps, Ferran