#358: BibIndex: check treatment of deleted records
-----------------------+----------------------------------------------------
  Reporter:  simko     |       Owner:  simko
      Type:  defect    |      Status:  new  
  Priority:  blocker   |   Milestone:  v1.0 
 Component:  BibIndex  |     Version:       
Resolution:            |    Keywords:       
-----------------------+----------------------------------------------------

Comment (by simko):

 I had a look at one record (872692) and:

  * it was uploaded on 2010-10-14 12:09:14 only, in this DELETED form,
    and it never existed on INSPIRE before; (so it is perhaps not
    necessary to upload it like from SPIRES, but OK)

  * here are bibindex jobs during that day, seems normal:

 {{{
     2010-10-14 06:36:37 --> Task #1003 (bibindex) started
     2010-10-14 06:37:32 --> Task #1003 (bibindex) exited
     2010-10-14 08:47:33 --> Task #1003 (bibindex) started
     2010-10-14 11:18:07 --> Task #1003 (bibindex) exited
     2010-10-14 13:22:13 --> Task #1003 (bibindex) started
     2010-10-14 15:09:22 --> Task #1003 (bibindex) exited
     2010-10-14 16:09:22 --> Task #1003 (bibindex) started
     2010-10-14 16:10:33 --> Task #1003 (bibindex) exited
 }}}

  * we don't have indexing logs anymore for that day, but apparently
    the record 980 value is not indexed, because:

 {{{
     In [7]: 872692 in search_pattern(None, p='collection:"DELETED"')
     Out[7]: False
 }}}

  * I have forced indexing of 872692, but there is no update to be
    done:

 {{{
 2010-11-18 14:47:41 --> idxWORD02F contains 14 words from 877832 records
 2010-11-18 14:47:41 --> idxWORD02F is in consistent state
 2010-11-18 14:47:41 --> idxWORD02F for 872692-872692 is in consistent
 state
 2010-11-18 14:47:41 --> idxWORD02F adding records #872692-#872692 started
 2010-11-18 14:47:41 --> Updating task progress to idxWORD02F adding recs
 872692-872692.
 2010-11-18 14:47:41 --> idxWORD02F fetching existing words for records
 #872692-#872692 started
 2010-11-18 14:47:41 --> idxWORD02F fetching existing words for records
 #872692-#872692 ended
 2010-11-18 14:47:41 --> ... record 872692 was declared deleted, removing
 its word list
 2010-11-18 14:47:41 --> ... record 872692, termlist: []
 2010-11-18 14:47:41 --> idxWORD02F adding records #872692-#872692 ended
 2010-11-18 14:47:41 --> idxWORD02F normal wordtable flush started
 2010-11-18 14:47:41 --> ...updating 0 words into idxWORD02F started
 2010-11-18 14:47:41 --> Updating task progress to idxWORD02F flushed 0/0
 words.
 2010-11-18 14:47:41 --> UPDATE idxWORD02R SET type='TEMPORARY' WHERE
 id_bibrec
                 BETWEEN 872692 AND 872692 AND type='CURRENT'
 2010-11-18 14:47:41 --> ...updating 0 words into idxWORD02F ended
 2010-11-18 14:47:41 --> ...updating reverse table idxWORD02R started
 2010-11-18 14:47:41 --> UPDATE idxWORD02R SET type='CURRENT' WHERE
 id_bibrec
                 BETWEEN 872692 AND 872692 AND type='FUTURE'
 2010-11-18 14:47:41 --> DELETE FROM idxWORD02R WHERE id_bibrec
                 BETWEEN 872692 AND 872692 AND type='TEMPORARY'
 2010-11-18 14:47:41 --> End of updating wordTable into idxWORD02F
 2010-11-18 14:47:41 --> ...updating reverse table idxWORD02R ended
 2010-11-18 14:47:41 --> idxWORD02F normal wordtable flush ended
 2010-11-18 14:47:41 --> Updating task progress to idxWORD02F flush ended.
 2010-11-18 14:47:41 --> 1 records took 0.0 seconds to complete.(8143
 recs/min)
 }}}

  * Indeed the word list is empty:

 {{{
    In [9]: marshal.loads(zlib.decompress(run_sql("SELECT * FROM idxWORD02R
 WHERE id_bibrec=872692")[0][1]))
    Out[9]: []
 }}}

 So, it is not good that BibIndex empties even the `collection` index,
 because we enter
 into a kind of catch 22 here when webcoll would like to use this to remove
 deleted records.

 The solution is therefore to change the webcoll query to use `980'
 explicitly that will work well, of course:

 {{{
     In [12]: 872692 in search_pattern(None, p='980:"DELETED"')
     Out[12]: True
 }}}

 but one has to beware in case we would like to really drop bibxxx tables
 one day.

 Submitting these investigation notes here for our amusement. :)

-- 
Ticket URL: <http://invenio-software.org/ticket/358#comment:1>
Invenio <http://invenio-software.org>

Reply via email to