Improving the public interface search was something that we investigated a 
great deal over the spring and early summer based on feedback from a number of 
institutions using the PUI. Unfortunately, we determined that making the 
changes required will necessitate a substantial change to the indexing for the 
application. We're working to identify and obtain resources in order to do so 
while maintaining forward progress in other areas of the application.

How the search on the public side currently works is documented only in 
technical terms. I've distilled what we know down for this purpose, but the 
explanation is still rather technical. If there are additional questions on the 
specifics, I'm happy to try to answer them, but this is definitely something 
that I lean on Laney and others on the developer side for better understanding. 
(And any mistakes in interpretation in what's below are mine.) Here is some 
information about how the PUI search currently indexes and weights information 
in order to display results:

  *   ArchivesSpace has multiple indexers (one each essentially for staff side 
information, public side information, and a real-time indexer that updates the 
index as changes are made) but all three put their information into one shared 
index. There is a field called fullrecord which takes nearly all the fields in 
ArchivesSpace and makes them a single field for the purposes of keyword search. 
PUI indexes fullrecord plus more for the collection organization display. The 
code that creates the staff interface records is the same as what is used by 
the PUI indexer with some additions for the separate PUI records.

Because there is only one index currently there is only one fullrecord field 
rather than one for staff and one for public as you might expect. Everything 
pulling from one index that includes a field for almost everything in 
ArchivesSpace is one of the reasons why information that is not displayed in 
the public interface affects public interface results.

  *   Anything that appears in the fields included in fullrecord is included in 
the index and available to the public and staff sides, though what displays is 
determined by other settings in the views. (This is why unpublished records 
rightly don't appear in the PUI though they can affect search results.) On the 
public side, the most heavily weighted fields are identifier, title, and 
finding aid title, but the results in record types that are resources and 
accessions are lifted highest, then agents and subjects.

For more specifics, the values after the ^ show the magnitude of the weighting.
Currently, these are hard-coded in the solrconfig.xml file and the solr model 
in the backend:
>From solrconfig.xml:

  1.  pf = "four_part_id^50" (pf is for Phrase Fields which boosts the score of 
documents in cases where all of the terms in the q parameter appear in close 
  2.  qf = "title^25 four_part_id^50 fullrecord" (qf is for Query Fields which 
specifies the fields in the index on which to perform the query)
  3.  bq = "primary_type:resource^100 primary_type:accession^100 
primary_type:subject^50 primary_type:agent_person^50 
primary_type:agent_corporate_entity^30 primary_type:agent_family^30" (bq is for 
Boost Query which specifies a factor by which a term or phrase should be 
"boosted" in importance when considering a match)
Passed into the solr query from solr model in the backend:

  1.  pf = "four_part_id^4"
  2.  qf = "four_part_id^3 title^2 finding_aid_filing_title^2 fullrecord"

  *   There were some changes made in some v2.3.x and v2.4.x releases of 
ArchivesSpace that made some parameters, such as whether the default operator 
is OR or AND, configurable, but they only work on the staff side because of how 
the PUI works. Changing the operator does not work on the public side because 
the code for the public side overwrites some areas when the final solr query 
gets built before it is sent to solr for retrieval. Also, there are some 
subqueries that are created in the PUI search that have AND and OR hardcoded so 
the final query contains a combination of ORs and ANDs. That is not 
configurable at all. Yale (and possibly Harvard as well, though Johanna would 
have a better sense of this) has done some work to modify search for its own 
purposes but I believe their changes have been scaled back significantly as 
they saw what we saw in investigating this - as currently set up, making a 
change in one area negatively impacts search in another area, including the 
staff interface.

We believe the only possibility for making substantial, lasting change to the 
PUI search is to refactor how search happens. This is a major undertaking, and 
it's very important to us that doing so not negatively impact how people use 
the PUI or the staff interface now or stop all progress on development in 
general for a significant period of time. Taking the time to identify ways to 
do this, determining the best path forward, and finding resources to pursue it 
is the reason we have not progressed with PUI search the way we were hoping 
earlier in the year.

We are incredibly fortunate that ArchivesSpace has such an active and engaged 
user community and that the application has become so fundamental to people's 
work. We take very seriously the degree to which making significant changes to 
it would impact people's work and want to pursue any such development in as 
thoughtful and responsible a way as we can. As plans progress we will involve 
the community in the discussions as they relate to PUI search specifically.

I hope knowing more about how the search currently works helps and please do 
reach out if you would like to discuss more before we reach that point.


Now that we have had a few months of experience with the ArchivesSpace PUI here 
at Harvard, we are reviewing user feedback to help us prioritize post-launch 
development needs. One area of concern is the PUI search functionality, as 
we've received multiple reports of unsatisfactory and unexpected search results.

Can you direct us to - or share -  documentation on the PUI search 
functionality, including relevance ranking, weighting, and indexed fields? This 
will help us evaluate what may be done locally to improve results, as well as 
participate in the discussion and planning for changes to the core code that 
would improve search results.


