#689: BibAuthorID: improve indexing gateway
-------------------------+--------------------
 Reporter:  simko        |      Owner:  scarli
     Type:  defect       |     Status:  new
 Priority:  blocker      |  Milestone:  v1.0
Component:  BibAuthorID  |    Version:
 Keywords:               |
-------------------------+--------------------
 The author index is indexing, besides name tokens, also canonical author
 IDs (e.g. `G.Aad.1`). This works by BibIndex calling BibAuthorID gateway
 that returns back which canonical author IDs are to be indexed for given
 record ID. However, the whole process works by starting from the indexing
 side, i.e. starting from a list of records that have had their metadata
 modified since the last run.

 This creates a problem when, for example, cataloguers modified canonical
 author IDs from `G.Aad.1` to, say, `G.X.Aad.3`, because the indexer cannot
 know about this change and that the corresponding records are to be re-
 indexed.

 We therefore need a new BibAuthorID gateway function that BibIndex would
 call with a timestamp parameter (corresponding to the timestamp of the
 last indexing run) that would return back the list of record IDs that must
 be re-indexed as a result of paper claiming activity that happened from
 the given timestamp up to current time.  BibAuthorID tables have all the
 necessary information, so can reply to such a request.

 Technically, in addition to `get_persons_from_recids()`, we would need a
 new function `get_recids_affected_since(yyyymmddhhmmss)` or at least
 `get_persons_affected_since(yyyymmddhhmmss)` and BibIndex would then do
 the adding/cleaning job.  The function should return not only records with
 modified canonical IDs, but also deleted IDs etc.

 (An alternative would be that BibAuthorID pages would force re-indexing of
 certain records live, as the claiming happens, by calling bibindex with
 `-w author -a -i 123456` arguments; but the former approach, changing the
 meaning of what `last-modified-since` means in teh case of author indexes,
 would be more robust.)

-- 
Ticket URL: <http://invenio-software.org/ticket/689>
Invenio <http://invenio-software.org>

Reply via email to