Jay, I talked Simeon here at arXiv and I"m including his response:
<SIMEON> The issue of having facets describing links out to different data repositories is somewhat interesting from an arXiv perspecitive (we have DC data now, might have links to other repos at some stage). Thus from a quick look this sounds like something potentially useful although not on our "must have" list, wihtout understanding the detail the approach looks plausible. Cheers, Simeon </SIMEON> ------------------------------------------ Peter Halliday Cornell University Library IT Repositories Group [email protected]<mailto:[email protected]> (Phone:) 607-255-1790 (Cell:) 607-329-6905 On Oct 17, 2011, at 5:22 PM, Jay Luker wrote: Hi all, Following up on the subject of facets raised during today's forum, we have an in-progress wiki page [1] describing some of our faceting goals. The part of most interest (or the part I'm most interested in feedback on) is the stuff about data links and how the data is stored in the MARC record. I'm just going to copy and paste that part. """ === Data Links === Many ADS records include links out to several types of data products in various archives, so a facet indicating data links, as well as the type of data, is the objective. This means a hierarchical facet display like the following: * ARCHIVES (10) * NED (8) * Spectra (6) * Images (2) * Chandra (2) To accomplish this using Solr's facet.prefix mechanism we would use a field called '''archive_facet''', and the indexed values would look like (for one particular document): * 0/NED * 1/NED/Spectra * 0/NED * 1/NED/Spectra * 0/NED * 1/NED/Images ==== MARC record ==== To store both the archive name and the type of data product we have to get a little creative with the MARC fields. We propose to store this data in the '''856''' field using the following subfields: * '''$u''': the link URL. * '''$y''': the link title. * '''$3''': the link type (e.g. DATA or FULLTEXT). * '''$z''': the archive name joined with the data type, i.e. "archive:Chandra:Spectra" or "archive:NED:Images". The "archive:" is meant to act as a namespace in case we have other types of properties in the future. * '''$9''': origin/provenance The Solr indexing process will need to identify these links where the '''$3''' subfield indicates 'DATA', extract the corresponding '''$z''' values and format them for indexing as described above. """ Any comments or questions about this approach? thanks, --jay [1] http://labs.adsabs.harvard.edu/trac/ads-invenio/wiki/MetadataFacets -- ****************************************************** Jay Luker Astrophysics Data System (ADS) [email protected]<mailto:[email protected]> Center for Astrophysics 617-495-4588 60 Garden Street MS 67 617-495-7356 fax Cambridge, MA 02138 ******************************************************

