#358: BibIndex: check treatment of deleted records
-----------------------+----------------------------------------------------
Reporter: simko | Owner: simko
Type: defect | Status: new
Priority: blocker | Milestone: v1.0
Component: BibIndex | Version:
Resolution: | Keywords:
-----------------------+----------------------------------------------------
Comment (by simko):
I had a look at one record (872692) and:
* it was uploaded on 2010-10-14 12:09:14 only, in this DELETED form,
and it never existed on INSPIRE before; (so it is perhaps not
necessary to upload it like from SPIRES, but OK)
* here are bibindex jobs during that day, seems normal:
{{{
2010-10-14 06:36:37 --> Task #1003 (bibindex) started
2010-10-14 06:37:32 --> Task #1003 (bibindex) exited
2010-10-14 08:47:33 --> Task #1003 (bibindex) started
2010-10-14 11:18:07 --> Task #1003 (bibindex) exited
2010-10-14 13:22:13 --> Task #1003 (bibindex) started
2010-10-14 15:09:22 --> Task #1003 (bibindex) exited
2010-10-14 16:09:22 --> Task #1003 (bibindex) started
2010-10-14 16:10:33 --> Task #1003 (bibindex) exited
}}}
* we don't have indexing logs anymore for that day, but apparently
the record 980 value is not indexed, because:
{{{
In [7]: 872692 in search_pattern(None, p='collection:"DELETED"')
Out[7]: False
}}}
* I have forced indexing of 872692, but there is no update to be
done:
{{{
2010-11-18 14:47:41 --> idxWORD02F contains 14 words from 877832 records
2010-11-18 14:47:41 --> idxWORD02F is in consistent state
2010-11-18 14:47:41 --> idxWORD02F for 872692-872692 is in consistent
state
2010-11-18 14:47:41 --> idxWORD02F adding records #872692-#872692 started
2010-11-18 14:47:41 --> Updating task progress to idxWORD02F adding recs
872692-872692.
2010-11-18 14:47:41 --> idxWORD02F fetching existing words for records
#872692-#872692 started
2010-11-18 14:47:41 --> idxWORD02F fetching existing words for records
#872692-#872692 ended
2010-11-18 14:47:41 --> ... record 872692 was declared deleted, removing
its word list
2010-11-18 14:47:41 --> ... record 872692, termlist: []
2010-11-18 14:47:41 --> idxWORD02F adding records #872692-#872692 ended
2010-11-18 14:47:41 --> idxWORD02F normal wordtable flush started
2010-11-18 14:47:41 --> ...updating 0 words into idxWORD02F started
2010-11-18 14:47:41 --> Updating task progress to idxWORD02F flushed 0/0
words.
2010-11-18 14:47:41 --> UPDATE idxWORD02R SET type='TEMPORARY' WHERE
id_bibrec
BETWEEN 872692 AND 872692 AND type='CURRENT'
2010-11-18 14:47:41 --> ...updating 0 words into idxWORD02F ended
2010-11-18 14:47:41 --> ...updating reverse table idxWORD02R started
2010-11-18 14:47:41 --> UPDATE idxWORD02R SET type='CURRENT' WHERE
id_bibrec
BETWEEN 872692 AND 872692 AND type='FUTURE'
2010-11-18 14:47:41 --> DELETE FROM idxWORD02R WHERE id_bibrec
BETWEEN 872692 AND 872692 AND type='TEMPORARY'
2010-11-18 14:47:41 --> End of updating wordTable into idxWORD02F
2010-11-18 14:47:41 --> ...updating reverse table idxWORD02R ended
2010-11-18 14:47:41 --> idxWORD02F normal wordtable flush ended
2010-11-18 14:47:41 --> Updating task progress to idxWORD02F flush ended.
2010-11-18 14:47:41 --> 1 records took 0.0 seconds to complete.(8143
recs/min)
}}}
* Indeed the word list is empty:
{{{
In [9]: marshal.loads(zlib.decompress(run_sql("SELECT * FROM idxWORD02R
WHERE id_bibrec=872692")[0][1]))
Out[9]: []
}}}
So, it is not good that BibIndex empties even the `collection` index,
because we enter
into a kind of catch 22 here when webcoll would like to use this to remove
deleted records.
The solution is therefore to change the webcoll query to use `980'
explicitly that will work well, of course:
{{{
In [12]: 872692 in search_pattern(None, p='980:"DELETED"')
Out[12]: True
}}}
but one has to beware in case we would like to really drop bibxxx tables
one day.
Submitting these investigation notes here for our amusement. :)
--
Ticket URL: <http://invenio-software.org/ticket/358#comment:1>
Invenio <http://invenio-software.org>