Hi there!

We could need a helping hand with bibindex.

We already uploaded and indexed  ~775.000 records. To achieve this, we had to 
change the data type of all bibxxx, birec_bibxxx and all PAIR/WORD/PHRASE tables
from MEDIUMT to INT, because there occured some id-overflow issues at record 
no. ~600.000.

And now we got the following problem:
Bibindex nearly freezes/runs very very slow when trying to index the global 
index. The problem here seems to be the flush (see the timestamps):





#########################
[...]

2012-02-15 18:25:17 --> idxWORD01F for 761636-762582 is in consistent state
2012-02-15 18:25:17 --> idxWORD01F adding records #761636-#762582 started
2012-02-15 18:25:44 --> idxWORD01F adding records #761636-#762582 ended
2012-02-15 18:25:44 --> idxWORD01F normal wordtable flush started
2012-02-15 18:25:44 --> ...updating 597577 words into idxWORD01F started
2012-02-15 18:27:17 --> ......processed 59757/597577 words
2012-02-15 18:28:49 --> ......processed 119514/597577 words
2012-02-15 18:30:21 --> ......processed 179271/597577 words
2012-02-15 18:31:55 --> ......processed 239028/597577 words
2012-02-15 18:33:28 --> ......processed 298785/597577 words
2012-02-15 18:35:02 --> ......processed 358542/597577 words
2012-02-15 18:36:34 --> ......processed 418299/597577 words
2012-02-15 18:38:05 --> ......processed 478056/597577 words
2012-02-15 18:39:36 --> ......processed 537813/597577 words
2012-02-15 18:41:07 --> ......processed 597570/597577 words
2012-02-15 18:41:08 --> ...updating 597577 words into idxWORD01F ended
2012-02-15 18:41:08 --> ...updating reverse table idxWORD01R started
2012-02-15 18:41:08 --> ...updating reverse table idxWORD01R ended
2012-02-15 18:41:08 --> idxWORD01F normal wordtable flush ended
2012-02-15 18:41:08 --> idxWORD01F backing up
2012-02-15 18:41:08 --> 150000 records took 10919.2 seconds to complete.(824 
recs/min)

[...]

2012-02-15 18:41:08 --> idxWORD01F for 762583-762635 is in consistent state
2012-02-15 18:41:08 --> idxWORD01F adding records #762583-#762635 started
2012-02-15 18:41:10 --> idxWORD01F adding records #762583-#762635 ended

[...]

2012-02-15 18:46:57 --> idxWORD01F adding records #774636-#775635 started
2012-02-15 18:47:27 --> idxWORD01F adding records #774636-#775635 ended
2012-02-15 18:47:27 --> idxWORD01F for 775636-776582 is in consistent state
2012-02-15 18:47:27 --> idxWORD01F adding records #775636-#776582 started
2012-02-15 18:47:53 --> idxWORD01F adding records #775636-#776582 ended
2012-02-15 18:47:53 --> idxWORD01F normal wordtable flush started
2012-02-15 18:47:53 --> ...updating 371042 words into idxWORD01F started
2012-02-16 10:33:04 --> ......processed 37104/371042 words
2012-02-17 01:51:15 --> ......processed 74208/371042 words
2012-02-17 16:47:32 --> ......processed 111312/371042 words
2012-02-18 07:54:41 --> ......processed 148416/371042 words
2012-02-18 22:11:18 --> ......processed 185520/371042 words
2012-02-19 13:33:44 --> ......processed 222624/371042 words
2012-02-20 05:13:12 --> ......processed 259728/371042 words

[...]
#########################






Here the things we already tested :

- repairing and reindexing the MySQL-index of each table (because maybe these 
indexes got corrupted during the change from mediumint to int)
- index optimizing tool of MySQL
- bibindex -w global --repair => success, but the problem is still there
- different flush sizes  (5.000, 25.000, 50.000)


Then we tried to index the records as little packages.
The last package (bibindex -w global -f 50000 --id=762635-776582) threw this 
exception (invenio.err):





#################################################
Error when putting the term \'\'1846 illustr\'\' into db 
(hitlist=intbitset([767550])): (1062, "Duplicate entry \'0\' for key 
\'PRIMARY\'")\n'


The following problem occurred on <http://zb0035.zb.kfa-juelich.de> (Invenio 
0.99.2.1484-875f2)

>> 2012-02-27 10:56:00 -> IntegrityError: (1062, "Duplicate entry '0' for key 
>> 'PRIMARY'")

>>> User details

No client information available

>>> Traceback details

Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/invenio/bibindex_engine.py", line 875, 
in put_word_into_db
    (word, set.fastdump()))
  File "/usr/lib/python2.4/site-packages/invenio/dbquery.py", line 160, in 
run_sql
    rc = cur.execute(sql, param)
  File "/usr/lib/python2.4/site-packages/MySQLdb/cursors.py", line 163, in 
execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.4/site-packages/MySQLdb/connections.py", line 35, in 
defaulterrorhandler
    raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry '0' for key 'PRIMARY'")
Locals by frame, innermost last
###################################################


See attachment if you need to complete log.


The according table (INSERT INTO idxPAIR01F...)  has a maxium ID of 16406911 
and its data type is set to INTEGER (as said in the beginning).
Do you have any suggestions what may be wrong here?



Thanks in andvance!





Kind regards,

Sebastian Schindler









-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

Kennen Sie schon unsere app? http://www.fz-juelich.de/app

Attachment: invenio.err
Description: invenio.err

Reply via email to