#688: generalization of second-order operators in SEQP
------------------------+----------------------
  Reporter:  valkyrie   |      Owner:  valkyrie
      Type:  task       |     Status:  new
  Priority:  major      |  Milestone:
 Component:  WebSearch  |    Version:
Resolution:             |   Keywords:  syntax
------------------------+----------------------

Comment (by tbrooks):

 The use case, and immediate driver for this ticket, as described above, is

 find cc US

 This search, in SPIRES, in the HEP collection, means:  find all papers
 from institutions whose institutions records have a country code = US
 This is basically a join, and the join is on 100u/700u (HEP) <->110u
 (inst)

 As Valkyrie and I mused, we could do this on indexing time, but it is
 tricky...since the indexer needs to update index entries for records that
 have not changed (i.e. the inst. record that is connected to this paper
 has changed, but this paper hasn't, but its index record needs to be
 updated.)  To me, this seems very isomorphic to the citation case, where
 the other collection serves as the citation dictionary.

 Additionally complicating the indexing time solution is the fact that once
 you understand what keys to join on, you should really have access to the
 full set of indexes in the other collection.  I.e. once I know that
 100u<->110a connects HEP to Inst, I should be able to access all indexes
 from the other collection via this relationship.   This argues for having
 a configuration + searching solution rather than indexing time.
 Especially since the use cases here are usually rare (i.e. admin use,
 occasional user use, so speed is not as crucial as flexibility and power)

 As regards the syntax:

 Your example:  (author:"Doe, J" in "ATLAS Notes")

 I'm not sure what it means.   This would be in HEP?  Searching for ATLAS
 notes written by Doe, J?  But how are these notes connected to HEP?  What
 is the relation on which we are joining the collections?  I guess I don't
 see what this searcher is expecting to see, so I'm not sure I can
 understand whether the syntax makes sense.   For me these 2nd order
 extensions are only reasonable for "authority file" type collections,
 where there is a sensible mapping from one collection to another index in
 the other.   For "ATLAS Notes" and similar collections within HEP or with
 similar data model as HEP, I think the joint/combined searching would be
 handled very differently.

 Similarly author:doe in (refersto:author:ellis and muon) doesn't make
 sense to me as "in" here seems identical to "and" as these are all in HEP
 collections.
 For example there are similar use cases that will come from conferences
 (find all papers presented at conferences in FRance) HEPNames (find all
 papers by undergraduates from Case Western), experiments...etc etc

 my proposal, again, is to handle the above cases by defining in a config
 files which collections can provide second order search, and what fields
 are joined in these cases:

 so a sample config file would look like:

 HEP:affiliation::Institutions::110_u

 (meaning in HEP one searches the affiliation index using the 110_u value
 from the inst record

 HEP:cnum::Conferences::<whatever>
 HEP:author::HEPNames::100_i  (etc etc)

 And

 The we can search using

 Institutions:<any inst index>:<search term> and bring that back to HEP via
 the above relation.

 The point here in the syntax being that the <any inst index> should be
 prefaced in such a way that we see we are expecting to use it in inst.
 however  <any inst index>:<search term> in inst is reasonable, just a bit
 more SPIRES-y so it doesn't fit with Invenio syntax.   I'm not sure which
 one is easier to parse, but I liked the analogy to refersto: in that
 refersto invokes a similar second order operation/indirect search.
 Whatever is easy to parse in the parser, and reasonably general is fine
 with me.


 OK regardless of these general concerns, we need to implement something
 for find cc US, and we could hack searching as a one-off, or indexing as a
 one-off, however I think making more general would be good.

-- 
Ticket URL: <http://invenio-software.org/ticket/688#comment:3>
Invenio <http://invenio-software.org>

Reply via email to