Hi.
The 3.8 upgrade offers the dom indexing by default and if you have taken that option (as seen in $KOHA_CONF) the xsl used instead of record.abs (~/koha-dev/etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl) uses a construct (z:id) for the 001 which uses that (if it exists) as the zebra unique id. This means if you have more than one bib record with the same 001 (as you get if you duplicate a bib for instance) it will only index the last one and it won't complain at all about it. Not sure if it's a hangover from using the xml used by authorities which stores the auth_id in the 001 or UNIMARC which might use 001 as the bib number. Either way I bet if you remove the 001 or make it unique then it will index OK. The better solution is to fix the xsl to probably not use the z:id for biblios or maybe get it to use the 999$c, but the zebra config scares me.
It took ages to find the cause so I hope this helps someone.
Ian
On 01/09/2012 18:11, Doug Kingston wrote:
On 1 September 2012 09:46, Jared Camins-Esakov
<jcam...@cpbibliography.com>wrote:

Doug,

So environment variables are not the issue.  We are carefully managing
those.

Make sure when you are using cron jobs that you set the environment
variables IN YOUR CRONTAB. Setting environment variables elsewhere is a
recipe for confusion and misery down the road. However, this is -- as you
say -- not the problem.


I have tried using the new tool checkNonIndexedBiblios.pl (from patch
6566)
and it indeed finds a few recent biblios that are not indexed.  Using the
-z option to mark them for indexing followed by a manual run of
rebuild_zebra -b -v -z did not get the biblios indexed.  I cranked up the
debugging on zebraidx (by modifying rebuild_zebra.pl and using -v -v) and
did not see any obvious errors in the output that would suggest why
indexing was failing.

Did you change your bibliographic frameworks? It could be a matter of the
biblionumber not being stored properly. The other thing to do is to confirm
that the non-indexed biblios are *actually* getting added to the zebraqueue
by the 6566 script. It's kind of a long shot, but it could be an issue with
the zebraqueue table getting corrupted. I've seen this happen when the
zebraqueue table got too large, and disk space was low.

So I think this is working as expected.  Disk space is ample on the system
in question, and the catalogue is small by most standards (about 2500
biblios).  I ran rebuild_zebra.pl with the -k flag so it left the exported
records and here's the tree I got.

library:/tmp# ls -altR p6tjtKrrK3/
p6tjtKrrK3/:
total 0
drwxrwxrwt 6 root root 1040 Sep  1 17:50 ..
drwx------ 5 koha koha  100 Sep  1 06:36 .
drwxr-xr-x 2 koha koha   60 Sep  1 06:36 upd_biblio
drwxr-xr-x 2 koha koha   60 Sep  1 06:36 del_biblio
drwxr-xr-x 2 koha koha   40 Sep  1 06:36 biblio

p6tjtKrrK3/upd_biblio:
total 16
-rw-r--r-- 1 koha koha 12670 Sep  1 06:36 exported_records
drwxr-xr-x 2 koha koha    60 Sep  1 06:36 .
drwx------ 5 koha koha   100 Sep  1 06:36 ..

p6tjtKrrK3/del_biblio:
total 0
drwx------ 5 koha koha 100 Sep  1 06:36 ..
drwxr-xr-x 2 koha koha  60 Sep  1 06:36 .
-rw-r--r-- 1 koha koha   0 Sep  1 06:36 exported_records

p6tjtKrrK3/biblio:
total 0
drwx------ 5 koha koha 100 Sep  1 06:36 ..
drwxr-xr-x 2 koha koha  40 Sep  1 06:36 .

Using marcprint.py, a small python program built around pymarc package, I
decoded this file and find 13 MARC records, as expected.
Example:
=LDR  00871nam a22002417a 4500
=001  201112071555.ls
=003  UkLoVW
=005  20111209110116.0
=008  111207t1982\\\\enkg\\\\r\\\\\001\0\eng\d
=040  \\$aUkLoVW$cUkLoVW
=099  \\$aQS 40
=100  1\$aSheffield, Ken$92330
=245  \0$aTen country dances :$bmainly from Thompson, Wright & Wilson.
=260  \\$aOxford :$b[The Author],$c1982.
=300  \\$a12 p. :$bmusic ;$c30 cm.
=490  1\$aFrom two barns ;$vv. 1
=650  \\$9117$aCountry dances
=650  \\$9127$aDance music
=830  \5$aFrom two barns$92331
=942  \\$2VWML$cBK$hQS 40$n0$6QS_00040
=999  \\$c14879$d14879
=952  \\$w2011-12-07$p10914$r2011-12-07$40$00$6QS_00040$915083$bVWML$10$oQS
40$d2011-12-07$70$cBOX$2VWML$yBK$aVWML
=952  \\$w2011-12-07$p11121$r2011-12-07$40$00$6QS_00040$915084$bVWML$10$oQS
40$d2011-12-07$71$cBOX$2VWML$yBK$aVWML

I have attached an ascii printout of all 13 records in case someone wants
to look for a pattern in these records.

The problem is either in the format/contents of those records, or in
zebraidx/zebrasrv or their config files.  My suspicion is with the later
since we have already had to fix one problem there with for bug 6566.

-Doug-

Regards,
Jared

--
Jared Camins-Esakov
Bibliographer, C & P Bibliography Services, LLC
(phone) +1 (917) 727-3445
(e-mail) jcam...@cpbibliography.com
(web) http://www.cpbibliography.com/




_______________________________________________
Koha mailing list  http://koha-community.org
Koha@lists.katipo.co.nz
http://lists.katipo.co.nz/mailman/listinfo/koha

--
Ian Bays
Director of Projects, PTFS Europe Limited
Content Management and Library Solutions
+44 (0) 800 756 6803 (phone)
+44 (0) 7774 995297 (mobile)
+44 (0) 800 756 6384 (fax)
skype: ian.bays
email: ian.b...@ptfs-europe.com

_______________________________________________
Koha mailing list  http://koha-community.org
Koha@lists.katipo.co.nz
http://lists.katipo.co.nz/mailman/listinfo/koha

Reply via email to