Dear Tibor, Le 6 mai 2011 à 10:21, Tibor Simko a écrit :
> On Thu, 05 May 2011, Johnny Mariéthoz wrote: >> Error when putting the term ''non-meat'' into db >> (hitlist=intbitset([22464])): (1062, "Duplicate entry '16777215' for >> key 1") > > The duplicate entry problem is related to incremental indexing of badly > washed/truncated index terms before they are pushed to index. It could > happen due to bad UTF-8 characters, due to change in work breaking > procedures, etc. We have seen it too on our servers, mostly for > full-text indexing. > > We believe we have fixed this problem in the latest git master branch; > but these fixes concern Invenio v1.0 release series only. If you are on > v0.99 release series, then some back-porting may be needed. Do you get > these troubles on RERO DOC running Invenio v0.99.1? Yes, this problem happens with our production server: Invenio v0.99.1 > In any case, rebuilding all your indexes from scratch (via bibindex -R) > should fix the problem for some time to come, even without patching your > sources. Because I think you see this problem only with incremental > indexing; it should not happen during full re-indexing. Is that right? I do not want to redindex all the files. It will takes too much time. Moreover, I think that we have a huge number of words as we have a lot of document with OCR inside which create a lot of new words in the index table. Can I change safety the type of the id in the idxWORD09F table from MediumInt to Int? Are they other tables that use this id? Is it a good idea? Note: In the past I tried to re-index all the document, but it takes one full day and crash the machine due to the memory problem. I tried several options (-M -f) with bibindex without success. This is due to one of our collection which is a scanned newpaper over 200 years which represents about 60000 scanned pdf files. I exclude this collection from the fulltext indexing. Thanks for your answers. Regards, > > Best regards > -- > Tibor Simko ---------------------------------------------------------------------- Johnny Mariéthoz RERO, Av. de la Gare 45, CH - 1920 MARTIGNY Téléphone: +41(0)27 721 8579 Fax : +41(0)27 721 8586 Web : http://www.rero.ch ReroDoc : http://doc.rero.ch, [email protected] ----------------------------------------------------------------------
