Hello Benoit,

[...]
> However in the example attached, we have XML entities both in the
> title and in the abstract. The abstract seems to be correctly
> unescaped but the title remains escaped leading to some bad results
> (Title is
> "Электронна
> яjтеория м ...").
>
> I don't seem to be able to find the cause of this weird
> behavior. Maybe one of you can?

Maybe this record was imported from somewhere?  I've had similar cases
from sources with a mix of different encodings.  Recode is your friend.
I've included this step in my problematic workflow:

 $ recode --diacritics html..utf8 <%s >%s'

In your case, it produces this Cyrillic result:

 Электронна
 яjтеория м ...

Hope it helps,

Ferran

Reply via email to