Author handling bfe_authors.py et al

2011-11-25 Thread Alexander Wagner

Hi!

Currently, bfe_authors.py uses

authors = []
authors_1 = bfo.fields('100__')
authors_2 = bfo.fields('700__')

This enforces authors to be considered only if both indicators are
blank. However, you may notice that from the authoritative definition at

   http://www.loc.gov/marc/bibliographic/bd100.html

One should actually set at least indicator 1 like

   100 0_  $aWinston Churchill
   100 1_  $aChurchill, Winston

to distingish storage of firstname lastname vs. lastname, firstname,
not to metion stuff like

   100 3_ $aFarquhar family

Given the fact that if we upload foreign data we win a lot of indicators
here I suggest to change code for author handling to use

authors = []
authors_1 = bfo.fields('100%%')
authors_2 = bfo.fields('700%%')

wherever it is applicable. This is also relevant for indexing, where the
default definition is __ as well.

Just got some 50.000 anonymous papers which I'll right now give back to
their authors ;)

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi



Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




Re: Author handling bfe_authors.py et al

2011-11-25 Thread Ferran Jorba
Hello Alexander,

 Currently, bfe_authors.py uses

 authors = []
 authors_1 = bfo.fields('100__')
 authors_2 = bfo.fields('700__')

 This enforces authors to be considered only if both indicators are
 blank. However, you may notice that from the authoritative definition at

http://www.loc.gov/marc/bibliographic/bd100.html

this is an example of the default values of Invenio not being Marc21
compliant.  I've complained about this a few times, and I should have
filled a task about those defaults, and sent a few patches, although I
haven't yet ;-(.

The problem of the default values not being correct has those
consequences: if you import records from another catalog, those values
mishave in Invenio.  Librarians (and library-educated computer people)
expect those records to behave like in the other system.  So it is
interoperability and economy.

The problem arises not only in those Python bibformat snippets, but also
in bibindex definition, and all export formats.  For example, my
bfe_author.py has those fields:
 
if (authors_type in ['','personal']):
authors.extend(bfo.fields('100%%'))
if (authors_type in ['','corporate']):
authors.extend(bfo.fields('110%%'))
if (authors_type in ['','meeting']):
authors.extend(bfo.fields('111%%'))
if (authors_type in ['','personal']):
authors.extend(bfo.fields('700%%'))
if (authors_type in ['','corporate']):
authors.extend(bfo.fields('710%%'))
if (authors_type in ['','meeting']):
authors.extend(bfo.fields('711%%'))
if (authors_type in ['','personal','corporate','meeting']):
authors.extend(bfo.fields('720%%'))

(In my case I have the need of show sometimes personal or corporate
authors, depending of the collection; I understand that it is not always
the case).

In bibindex (/admin/bibindex/bibindexadmin.py/field), you have also to
add those fields:

 author 100%, 110%, 111%, 700%, 710%, 711%, 720%

In the Marcxml to DC xls xls as well:

  xsl:for-each select=datafield[(@tag=100 or @tag=110 or @tag=111)]
   dc:creator
xsl:call-template name=subfieldSelect
 xsl:with-param name=codesab/xsl:with-param
/xsl:call-template
   /dc:creator
  /xsl:for-each

  xsl:for-each select=datafield[(@tag=700 or @tag=710 or @tag=711 or 
@tag=720)]
   dc:contributor
xsl:call-template name=subfieldSelect
 xsl:with-param name=codesab/xsl:with-param
/xsl:call-template
   /dc:contributor
  /xsl:for-each

I borrowed the following xsl function from somewhere (LC, I think):

  !--- Added FJ 5-feb-2010 to resolve template --
  xsl:template name=subfieldSelect
xsl:param name=codesabcdefghijklmnopqrstuvwxyz/xsl:param
xsl:param name=delimeter
  xsl:text /xsl:text
/xsl:param
xsl:variable name=str
  xsl:for-each select=subfield
xsl:if test=contains($codes, @code)
  xsl:value-of select=text()/
  xsl:value-of select=$delimeter/
/xsl:if
  /xsl:for-each
/xsl:variable
xsl:value-of 
select=substring($str,1,string-length($str)-string-length($delimeter))/
  /xsl:template

And so on.  It is a major task, but much needed.  Newcomers are likely
to feel frustated due to the system not behaving as espected.

Ferran


Re: Author handling bfe_authors.py et al

2011-11-25 Thread Ferran Jorba
Hello Alexander,

 this is an example of the default values of Invenio not being Marc21
 compliant.

 Right. And then these are bad defaults.

 I've complained about this a few times, and I should have
 filled a task about those defaults, and sent a few patches, although I
 haven't yet ;-(.

The reasons why I haven't done it myself, besides the lack-of-time issue
(bad excuse) are that on my instances I have a mix of
better-than-default values and local ones; I don't have (or I don't have
the resources to have) a reasonably recent Invenio instance running
anywhere (we are stilll at 0.99.1), so I'd be patching something old;
and, even with those restrictions, when I tried, I found those example
records (modules/miscutil/sql/tabfill.sql) and the testing
infrastructure that I didn't know how to handle.  So I feel overwhelmed
each time I try ;-(

But idealy one should be able to go, for example, to
http://www.archive.org/details/ol_data and get and load all University
of Toronto Library catalog in the local Invenio and use it, maybe just
adjusting some valid collection field value.

Now it is not the case.  And it is a pity, because after the suitable
adjustments, Invenio is very able to handle them.  It is even possible
to have something like authority records in it (at least we have them
more-or-less working at http://traces.uab.cat/).

Best regards,

Ferran


Re: Author handling bfe_authors.py et al

2011-11-25 Thread Alexander Wagner

On 25.11.2011 11:59, Ferran Jorba wrote:

Hi!


this is an example of the default values of Invenio not being Marc21
compliant.


Right. And then these are bad defaults.


I've complained about this a few times, and I should have
filled a task about those defaults, and sent a few patches, although I
haven't yet ;-(.


The reasons why I haven't done it myself, besides the lack-of-time issue
(bad excuse) are that on my instances I have a mix of
better-than-default values and local ones;


Well, it would be great if you could drop me some sort of list in case
your previous post was not complete. We're about to roll out some
installation here based on recent Invenio so we might work that in if
it's not already done.

So the suggestion would be: give me what you have and I'll check against
current git master (probably some weeks back).

[...]

But idealy one should be able to go, for example, to
http://www.archive.org/details/ol_data and get and load all University
of Toronto Library catalog in the local Invenio and use it, maybe just
adjusting some valid collection field value.


Agree. But I don't need to go to Toronto I'd just start out with out own
catalogue. Still, it's more cumbersome to fiddle out everything again
you might already have found in your (local) patches.

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi



Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




Re: Author handling bfe_authors.py et al

2011-11-25 Thread Ferran Jorba
Hi again,

 The reasons why I haven't done it myself, besides the lack-of-time issue
 (bad excuse) are that on my instances I have a mix of
 better-than-default values and local ones;

 Well, it would be great if you could drop me some sort of list in case
 your previous post was not complete. We're about to roll out some
 installation here based on recent Invenio so we might work that in if
 it's not already done.

Let me publish my logical fields list here on the list, because it is
easy and likely to be useful to most readers (I've left off a few local
fields):

 4. Logical fields overview
  _
 |Field__|MARC_Tags|Translations___|
 |   |00%, 01%, 02%, 03%, 04%, 05%,|   |
 |   |06%, 07%, 08%, 09%, 10%, 11%,|   |
 |   |12%, 13%, 14%, 15%, 16%, 17%,|   |
 |   |18%, 19%, 20%, 21%, 22%, 23%,|   |
 |   |24%, 25%, 26%, 27%, 28%, 29%,|   |
 |   |30%, 31%, 32%, 33%, 34%, 35%,|   |
 |   |36%, 37%, 38%, 39%, 40%, 41%,|   |
 |   |42%, 43%, 44%, 45%, 46%, 47%,|ca, cs, de, el, en, es, fr, it,|
 |any_field  |48%, 49%, 50%, 51%, 52%, 53%,|no, pt, ru, sk, sv, uk |
 |   |54%, 55%, 56%, 57%, 58%, 59%,|   |
 |   |60%, 61%, 62%, 63%, 64%, 65%,|   |
 |   |66%, 67%, 68%, 69%, 70%, 71%,|   |
 |   |72%, 73%, 74%, 75%, 76%, 77%,|   |
 |   |78%, 79%, 80%, 81%, 82%, 83%,|   |
 |   |84%, 85%, 86%, 87%, 88%, 89%,|   |
 |   |90%, 91%, 92%, 93%, 94%, 95%,|   |
 |___|96%,_97%,_98%|___|
 |title  |130%, 210%, 222%, 240%, 245%,|ca, cs, de, el, en, es, fr, it,|
 |___|246%,_247%,_730%,_740%___|no,_pt,_ru,_sk,_sv,_uk_|
 |author |100%, 110%, 111%, 700%, 710%,|ca, cs, de, el, en, es, fr, it,|
 |___|711%,_720%___|no,_pt,_ru,_sk,_sv,_uk_|
 |abstract   |520% |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |keyword|653% |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |series_|830%,_440%,_490%_|ca,_en,_es_|
 |subject|600%, 610%, 611%, 650%, 651% |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |fulltext   |8564%u   |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |collection |980% |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |year   |260%c, 973%y |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |record_ID  |001  |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |issn___|773%x,_022%a_|ca,_en,_es,_fr_|


The indexes is one-to-one with this one *except* for keyword.  What we
have done is to keep the proper subject tags on the official 600, 610,
611, 650 and 653 and keyword as 653, but merge them as *indexes*, so the
index for keyword (PĂ gina inicial  Admin Area  Manage Indexes) has
both the subject and the keyword fields.  That's the solution we've come
with.

And about the bibformat and friends, that is:

 lib/python/invenio/bibformat_elements/
 etc/bibformat/format_templates/
 etc/bibformat/output_formats/

I keep them under guilt patches (http://repo.or.cz/w/guilt.git or
http://packages.debian.org/guilt), but they would only apply to a 0.99.1
release.  I can happily send you a tarball for each; but please
understand there is a mix of better, worse and bad solutions, as I have
been learning to tame the beast over those years.

I'll come to you back in a while.

Cheers,

Ferran


Hide certain MARC subfields from xml output in the search interface

2011-11-25 Thread Theodoros Theodoropoulos

Hello everyone,

There is a need to store some 'sensitive' data in the MARC record, that 
should be viewable/editable by the librarian, however it should not 
appear in the xm/MARCXML/(text)MARC output format of the search interface.


After spending a some time testing where this could be applied, I 
realized that although this simple check could be put in the 
print_record function of search_engine.py, the fact that the xm format 
of the record is already cached and is read and displayed as it is, 
renders this 'hack' useless.


I verified that if I force on_the_fly=True in format_record function 
of bibformat.py, I get what I want with HUGE performance drop and 
this is unacceptable.


Is there another way to make this work? Is the cached xm data (in the 
bibfmt table) used for something else than display? Should I try to 
strip the sensitive data from the record only when updating this table? 
Is this possible?


Any ideas are welcome!

Thanks in advance for your time,
Best regards,
Theodoros Theodoropoulos