Re: The invasion of the XML entities
Hi Benoit: On Wed, 05 May 2010, Benoit Thiell wrote: > Thanks for your help. However the problem is not with the content of > the record, but the way bibformat handles its display. Right now we do > not have plans to change the content of the record. If you store entities as such in the MARC records, then they would not be easily indexable/searchable. I think it is better to expand entities upon upload and store them as UTF-8 characters, if at all possible. Best regards -- Tibor Simko
Re: The invasion of the XML entities
Hi Ferran, Ferran Jorba wrote: [...] However in the example attached, we have XML entities both in the title and in the abstract. The abstract seems to be correctly unescaped but the title remains escaped leading to some bad results (Title is "Электронна яjтеория м ..."). I don't seem to be able to find the cause of this weird behavior. Maybe one of you can? Maybe this record was imported from somewhere? I've had similar cases from sources with a mix of different encodings. Recode is your friend. I've included this step in my problematic workflow: $ recode --diacritics html..utf8 <%s >%s' In your case, it produces this Cyrillic result: Электронна яjтеория м ... Hope it helps, Thanks for your help. However the problem is not with the content of the record, but the way bibformat handles its display. Right now we do not have plans to change the content of the record. If we decide to go down that path then we will consider your advice. Cheers, Benoit.
Re: The invasion of the XML entities
Hello Benoit, [...] > However in the example attached, we have XML entities both in the > title and in the abstract. The abstract seems to be correctly > unescaped but the title remains escaped leading to some bad results > (Title is > "Электронна > яjтеория м ..."). > > I don't seem to be able to find the cause of this weird > behavior. Maybe one of you can? Maybe this record was imported from somewhere? I've had similar cases from sources with a mix of different encodings. Recode is your friend. I've included this step in my problematic workflow: $ recode --diacritics html..utf8 <%s >%s' In your case, it produces this Cyrillic result: Электронна яjтеория м ... Hope it helps, Ferran
The invasion of the XML entities
Hi folks, here at ADS we have a few records containing XML entities. I know that for some (all?) of the format elements it is possible to set the "escape" option to 0 in order to prevent the escaping of the strings. That's what we've been doing in order to allow the usage of HTML tags. However in the example attached, we have XML entities both in the title and in the abstract. The abstract seems to be correctly unescaped but the title remains escaped leading to some bad results (Title is "Электронна яjтеория м ..."). I don't seem to be able to find the cause of this weird behavior. Maybe one of you can? Cheers, Benoit. PHYSICS REFEREED ARTICLE ARTICLE 1953CzJPh...2...18A 1953CzJPh...2...18A ADS bibcode 10.1007/BF01687975 DOI Электронная теория м еталлического алюми ния AntonÄÃk, Emil Antoncik, E regular Институт теоретичес кой фиэики при Карлов ом университете Czechoslovak Journal of Physics 2 18-29 1953 Czechoslovak Journal of Physics, Volume 2, Issue 1, pp.18-29 Czechoslovak Journal of Physics 2 18-29 1953 1953-12-00 1953 При помощи двух экспе рименталъно установ ленных фактов, а именн о, константы решетки и пространственной г руппы металлическог о алюминия, был вычисл ен энергетический спсктр валентных эле ктронов. На основании этих вычислений были истолкованы спектры испускания м ягких Х-лучей. Далее бы ла вычислена энергия связи кристалла алюминия и исследова на зависимостъ конст анты решетки твердых растзоров алюминия от концентрации мета ллических примесей. http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=1953CzJPh...2...18A&link_type=EJOURNAL Electronic On-line Article (HTML) SPRINGER