Dear Tibor,

Le 6 mai 2011 à 10:21, Tibor Simko a écrit :

> On Thu, 05 May 2011, Johnny Mariéthoz wrote:
>> Error when putting the term ''non-meat'' into db
>> (hitlist=intbitset([22464])): (1062, "Duplicate entry '16777215' for
>> key 1")
> 
> The duplicate entry problem is related to incremental indexing of badly
> washed/truncated index terms before they are pushed to index.  It could
> happen due to bad UTF-8 characters, due to change in work breaking
> procedures, etc.  We have seen it too on our servers, mostly for
> full-text indexing.
> 
> We believe we have fixed this problem in the latest git master branch;
> but these fixes concern Invenio v1.0 release series only.  If you are on
> v0.99 release series, then some back-porting may be needed.  Do you get
> these troubles on RERO DOC running Invenio v0.99.1?
Yes, this problem happens with our production server: Invenio v0.99.1

> In any case, rebuilding all your indexes from scratch (via bibindex -R)
> should fix the problem for some time to come, even without patching your
> sources.  Because I think you see this problem only with incremental
> indexing; it should not happen during full re-indexing.  Is that right?

I do not want to redindex all the files. It will takes too much time. Moreover, 
I think that we have a huge number of words as we have a lot of document with 
OCR inside which create a lot of new words in the index table.
Can I change safety the type of the id in the idxWORD09F table from MediumInt 
to Int?
Are they other tables that use this id?
Is it a good idea?

Note: In the past I tried to re-index all the document, but it takes one full 
day and crash the machine due to the memory problem. I tried several options 
(-M -f) with bibindex without success. This is due to one of our collection 
which is a scanned newpaper over 200 years which represents about 60000 scanned 
pdf files. I exclude this collection from the fulltext indexing.

Thanks for your answers.

Regards,

> 
> Best regards
> -- 
> Tibor Simko

----------------------------------------------------------------------
Johnny Mariéthoz
RERO, Av. de la Gare 45, CH - 1920 MARTIGNY
Téléphone:  +41(0)27 721 8579
Fax              : +41(0)27 721 8586
Web            : http://www.rero.ch
ReroDoc    : http://doc.rero.ch, [email protected]
----------------------------------------------------------------------


Reply via email to