Re: The invasion of the XML entities

2010-05-05 Thread Tibor Simko
Hi Benoit:

On Wed, 05 May 2010, Benoit Thiell wrote:
> Thanks for your help. However the problem is not with the content of
> the record, but the way bibformat handles its display. Right now we do
> not have plans to change the content of the record.

If you store entities as such in the MARC records, then they would not
be easily indexable/searchable.  I think it is better to expand entities
upon upload and store them as UTF-8 characters, if at all possible.

Best regards
-- 
Tibor Simko


Re: The invasion of the XML entities

2010-05-05 Thread Benoit Thiell

Hi Ferran,

Ferran Jorba wrote:

[...]

However in the example attached, we have XML entities both in the
title and in the abstract. The abstract seems to be correctly
unescaped but the title remains escaped leading to some bad results
(Title is
"Электронна
яjтеория м ...").

I don't seem to be able to find the cause of this weird
behavior. Maybe one of you can?


Maybe this record was imported from somewhere?  I've had similar cases
from sources with a mix of different encodings.  Recode is your friend.
I've included this step in my problematic workflow:

 $ recode --diacritics html..utf8 <%s >%s'

In your case, it produces this Cyrillic result:

 Электронна
 яjтеория м ...

Hope it helps,


Thanks for your help. However the problem is not with the content of the 
record, but the way bibformat handles its display. Right now we do not 
have plans to change the content of the record. If we decide to go down 
that path then we will consider your advice.


Cheers,
Benoit.



Re: The invasion of the XML entities

2010-05-05 Thread Ferran Jorba
Hello Benoit,

[...]
> However in the example attached, we have XML entities both in the
> title and in the abstract. The abstract seems to be correctly
> unescaped but the title remains escaped leading to some bad results
> (Title is
> "Электронна
> яjтеория м ...").
>
> I don't seem to be able to find the cause of this weird
> behavior. Maybe one of you can?

Maybe this record was imported from somewhere?  I've had similar cases
from sources with a mix of different encodings.  Recode is your friend.
I've included this step in my problematic workflow:

 $ recode --diacritics html..utf8 <%s >%s'

In your case, it produces this Cyrillic result:

 Электронна
 яjтеория м ...

Hope it helps,

Ferran


The invasion of the XML entities

2010-05-05 Thread Benoit Thiell

Hi folks,

here at ADS we have a few records containing XML entities. I know that 
for some (all?) of the format elements it is possible to set the 
"escape" option to 0 in order to prevent the escaping of the strings. 
That's what we've been doing in order to allow the usage of HTML  tags.


However in the example attached, we have XML entities both in the title 
and in the abstract. The abstract seems to be correctly unescaped but 
the title remains escaped leading to some bad results (Title is 
"Электронна 
яjтеория м ...").


I don't seem to be able to find the cause of this weird behavior. Maybe 
one of you can?


Cheers,
Benoit.

  
PHYSICS
  
  
REFEREED
  
  
ARTICLE
  
  
ARTICLE
  
  
1953CzJPh...2...18A
  
  
1953CzJPh...2...18A
ADS bibcode
  
  
10.1007/BF01687975
DOI
  
  
Электронная теория м еталлического алюми ния
  
  
Antončík, Emil
Antoncik, E
regular
Институт теоретичес кой фиэики при Карлов ом университете
  
  
Czechoslovak Journal of Physics
2
18-29
1953
Czechoslovak Journal of Physics, Volume 2, Issue 1, pp.18-29
  
  
Czechoslovak Journal of Physics
2
18-29
1953
  
  
1953-12-00
  
  
1953
  
  
При помощи двух экспе рименталъно установ ленных фактов, а именн о, константы решетки и пространственной г руппы металлическог о алюминия, был вычисл ен энергетический спсктр валентных эле ктронов. На основании этих вычислений были истолкованы спектры испускания м ягких Х-лучей. Далее бы ла вычислена энергия связи кристалла алюминия и исследова на зависимостъ конст анты решетки твердых растзоров алюминия от концентрации мета ллических примесей.
  
  
http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=1953CzJPh...2...18A&link_type=EJOURNAL
Electronic On-line Article (HTML)
  
  
SPRINGER