Re: Faceting considerations & MARC for data links

Peter J. Halliday Tue, 18 Oct 2011 11:19:49 -0700

Jay,

I talked Simeon here at arXiv and I"m including his response:


<SIMEON>

The issue of having facets describing links out to different data repositories 
is somewhat interesting from an arXiv perspecitive (we have DC data now, might 
have links to other repos at some stage). Thus from a quick look this sounds 
like something potentially useful although not on our "must have" list, wihtout 
understanding the detail the approach looks plausible.

Cheers,
Simeon
</SIMEON>

------------------------------------------
Peter Halliday
Cornell University Library IT
Repositories Group
[email protected]<mailto:[email protected]>
(Phone:) 607-255-1790
(Cell:) 607-329-6905






On Oct 17, 2011, at 5:22 PM, Jay Luker wrote:

Hi all,

Following up on the subject of facets raised during today's forum, we have an 
in-progress wiki page [1] describing some of our faceting goals. The part of 
most interest (or the part I'm most interested in feedback on) is the stuff 
about data links and how the data is stored in the MARC record.

I'm just going to copy and paste that part.

"""
=== Data Links ===

Many ADS records include links out to several types of data products in various 
archives, so a facet indicating data links, as well as the type of data, is the 
objective. This means a hierarchical facet display like the following:

* ARCHIVES (10)
  * NED (8)
    * Spectra (6)
    * Images (2)
  * Chandra (2)

To accomplish this using Solr's facet.prefix mechanism we would use a field 
called '''archive_facet''', and the indexed values would look like (for one 
particular document):

* 0/NED
* 1/NED/Spectra
* 0/NED
* 1/NED/Spectra
* 0/NED
* 1/NED/Images

==== MARC record ====

To store both the archive name and the type of data product we have to get a 
little creative with the MARC fields. We propose to store this data in the 
'''856''' field using the following subfields:

* '''$u''': the link URL.
* '''$y''': the link title.
* '''$3''': the link type (e.g. DATA or FULLTEXT).
* '''$z''': the archive name joined with the data type, i.e. 
"archive:Chandra:Spectra" or "archive:NED:Images". The "archive:" is meant to 
act as a namespace in case we have other types of properties in the future.
* '''$9''': origin/provenance

The Solr indexing process will need to identify these links where the '''$3''' 
subfield indicates 'DATA', extract the corresponding '''$z''' values and format 
them for indexing as described above.
"""

Any comments or questions about this approach?

thanks,
--jay

[1] http://labs.adsabs.harvard.edu/trac/ads-invenio/wiki/MetadataFacets

--
******************************************************
Jay Luker               Astrophysics Data System (ADS)
[email protected]<mailto:[email protected]>  Center for Astrophysics
617-495-4588            60 Garden Street  MS 67
617-495-7356 fax        Cambridge, MA  02138
******************************************************

Re: Faceting considerations & MARC for data links

Reply via email to