Re: [CODE4LIB] Getting data from Voyager into XML?

Tod Olson Fri, 19 Jan 2007 15:36:02 -0800

On Jan 19, 2007, at 6:53 AM, Erik Hatcher wrote:

Great information.  I apologize for being a late comer to the game
and bringing up FAQs.


Well, I'm not certain Q is A-ed very F.

What about date normalization?

One thing that must be considered when doing faceted browsing is that
it works best with some pre-processed data, such as years rather than
full dates.  The question becomes where does the logic for stripping
out the years belong?  Solr could do it if configured with a custom
analyzer for certain fields, or the client could do it.  Is there
XSLT to do this sort of thing with dates available?


I think the XSLT side of this has been hashed out.

A couple questions that come to mind on the side of the underlying
search engine. For simple dates, say plain years, straight text
matching is okay. But when the record spans some time interval, say
1940s or 1776-1783 or 17th century, it would be nice to have some
support from the underlying search engine. Then maybe the occasional
user wants to express a query in an inteval. I've done the "list
every year in the interval" and it works if everything is a year, but
I find it a bit unsatisfying. The kind of date operations that you
find in RDBMSes seem to be useful. I'm not certain what Luncene
offers. But for facets, I guess a list of discrete years is just what
you're likely to have to work with. displaying the year facet and
offering a range could be interesting.

Apologies for rambling. I woke up a long time ago in another time zone.

-Tod

On Jan 19, 2007, at 5:58 AM, Tod Olson wrote:

On Jan 19, 2007, at 4:07 AM, Erik Hatcher wrote:

On Jan 17, 2007, at 3:26 PM, Andrew Nagy wrote:

One thing I am hoping that can come out of the preconference is a
standard XSLT doc.  I sat down with my metadata librarian to
develop our
XSLT doc -- determining what fields are to be searchable what
fields
should be left out to help speed up results, etc.

It's pretty easy, I think you will be amazed how fast you can
have a
functioning system with very little effort.


You're quite right with that last statement.

I am, however, skeptical of a purely MARC -> XSLT -> Solr solution.
The MARC data I've seen requires some basic cleanup (removing
dots at
the end of subjects, normalizing dates, etc) in order to be
useful as
facets.  While XSLT is powerful, this type of data manipulation is
better (IMO) done with scripting languages that allow for easy
tweaking in a succinct way.  I'm sure XSLT could do everything that
you'd want done; you can also drive screws in with a hammer :)


So the punctuation stripping has already been done in XSLT.

LoC has a MARCXML -> MODS XSLT stylesheet [1] which strips out the
evil
ISBD punctuation. I've generally found mapping from MODS to be more
convenient than mapping from MARC, so while it's an extra step, it
does
save a little programmer time since some of the hidden hierarchy in
the
MARC data is made explicit in the MODS structure.

If hopping through MODS is unacceptable, the LoC has the punctuation-
stripping nicely tucked away into a MARC Conversion Utility
Stylesheet
that you could use directly in a MARC XML -> Solr transformation. [2]

[1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS.xsl
[2] http://www.loc.gov/marcxml/xslt/MARC21slimUtils.xsl


Tod Olson <[EMAIL PROTECTED]>
Programmer/Analyst
University of Chicago Library

Re: [CODE4LIB] Getting data from Voyager into XML?

Reply via email to