Hi there! We could need a helping hand with bibindex.
We already uploaded and indexed ~775.000 records. To achieve this, we had to change the data type of all bibxxx, birec_bibxxx and all PAIR/WORD/PHRASE tables from MEDIUMT to INT, because there occured some id-overflow issues at record no. ~600.000. And now we got the following problem: Bibindex nearly freezes/runs very very slow when trying to index the global index. The problem here seems to be the flush (see the timestamps): ######################### [...] 2012-02-15 18:25:17 --> idxWORD01F for 761636-762582 is in consistent state 2012-02-15 18:25:17 --> idxWORD01F adding records #761636-#762582 started 2012-02-15 18:25:44 --> idxWORD01F adding records #761636-#762582 ended 2012-02-15 18:25:44 --> idxWORD01F normal wordtable flush started 2012-02-15 18:25:44 --> ...updating 597577 words into idxWORD01F started 2012-02-15 18:27:17 --> ......processed 59757/597577 words 2012-02-15 18:28:49 --> ......processed 119514/597577 words 2012-02-15 18:30:21 --> ......processed 179271/597577 words 2012-02-15 18:31:55 --> ......processed 239028/597577 words 2012-02-15 18:33:28 --> ......processed 298785/597577 words 2012-02-15 18:35:02 --> ......processed 358542/597577 words 2012-02-15 18:36:34 --> ......processed 418299/597577 words 2012-02-15 18:38:05 --> ......processed 478056/597577 words 2012-02-15 18:39:36 --> ......processed 537813/597577 words 2012-02-15 18:41:07 --> ......processed 597570/597577 words 2012-02-15 18:41:08 --> ...updating 597577 words into idxWORD01F ended 2012-02-15 18:41:08 --> ...updating reverse table idxWORD01R started 2012-02-15 18:41:08 --> ...updating reverse table idxWORD01R ended 2012-02-15 18:41:08 --> idxWORD01F normal wordtable flush ended 2012-02-15 18:41:08 --> idxWORD01F backing up 2012-02-15 18:41:08 --> 150000 records took 10919.2 seconds to complete.(824 recs/min) [...] 2012-02-15 18:41:08 --> idxWORD01F for 762583-762635 is in consistent state 2012-02-15 18:41:08 --> idxWORD01F adding records #762583-#762635 started 2012-02-15 18:41:10 --> idxWORD01F adding records #762583-#762635 ended [...] 2012-02-15 18:46:57 --> idxWORD01F adding records #774636-#775635 started 2012-02-15 18:47:27 --> idxWORD01F adding records #774636-#775635 ended 2012-02-15 18:47:27 --> idxWORD01F for 775636-776582 is in consistent state 2012-02-15 18:47:27 --> idxWORD01F adding records #775636-#776582 started 2012-02-15 18:47:53 --> idxWORD01F adding records #775636-#776582 ended 2012-02-15 18:47:53 --> idxWORD01F normal wordtable flush started 2012-02-15 18:47:53 --> ...updating 371042 words into idxWORD01F started 2012-02-16 10:33:04 --> ......processed 37104/371042 words 2012-02-17 01:51:15 --> ......processed 74208/371042 words 2012-02-17 16:47:32 --> ......processed 111312/371042 words 2012-02-18 07:54:41 --> ......processed 148416/371042 words 2012-02-18 22:11:18 --> ......processed 185520/371042 words 2012-02-19 13:33:44 --> ......processed 222624/371042 words 2012-02-20 05:13:12 --> ......processed 259728/371042 words [...] ######################### Here the things we already tested : - repairing and reindexing the MySQL-index of each table (because maybe these indexes got corrupted during the change from mediumint to int) - index optimizing tool of MySQL - bibindex -w global --repair => success, but the problem is still there - different flush sizes (5.000, 25.000, 50.000) Then we tried to index the records as little packages. The last package (bibindex -w global -f 50000 --id=762635-776582) threw this exception (invenio.err): ################################################# Error when putting the term \'\'1846 illustr\'\' into db (hitlist=intbitset([767550])): (1062, "Duplicate entry \'0\' for key \'PRIMARY\'")\n' The following problem occurred on <http://zb0035.zb.kfa-juelich.de> (Invenio 0.99.2.1484-875f2) >> 2012-02-27 10:56:00 -> IntegrityError: (1062, "Duplicate entry '0' for key >> 'PRIMARY'") >>> User details No client information available >>> Traceback details Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/invenio/bibindex_engine.py", line 875, in put_word_into_db (word, set.fastdump())) File "/usr/lib/python2.4/site-packages/invenio/dbquery.py", line 160, in run_sql rc = cur.execute(sql, param) File "/usr/lib/python2.4/site-packages/MySQLdb/cursors.py", line 163, in execute self.errorhandler(self, exc, value) File "/usr/lib/python2.4/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler raise errorclass, errorvalue IntegrityError: (1062, "Duplicate entry '0' for key 'PRIMARY'") Locals by frame, innermost last ################################################### See attachment if you need to complete log. The according table (INSERT INTO idxPAIR01F...) has a maxium ID of 16406911 and its data type is set to INTEGER (as said in the beginning). Do you have any suggestions what may be wrong here? Thanks in andvance! Kind regards, Sebastian Schindler ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Kennen Sie schon unsere app? http://www.fz-juelich.de/app
invenio.err
Description: invenio.err
