Re: [CODE4LIB] protocol for obtaining holdings not on/from OCLC

Joe Hourcle Thu, 17 Jan 2008 05:48:05 -0800

On Thu, 17 Jan 2008, Jakob Voss wrote:

Hi Joe,


You wrote:

On Wed, 16 Jan 2008, Jakob Voss wrote:

Someone just has to define was 'holding' is and what information it must
carry, so we can define a simple holding interchange format that is not
as fuzzy and overblown as most of the library most other library
standards. As a sideline we implement another part of FRBR (a mapping
from frbr:manifestation to frbr:item)


I've been fighting with the issue of what do you return in response to a
query (in the context of federated search systems ... but for scientific
data, not bibliographic) for almost 4 years now.

Although I think FRBR helps to frame the problem, the real issue is that
there are many reasons why someone might ask the question, and without
knowing what they're trying to solve, we don't know what sort of a record
we should be returning.


A holding webservice is not meant to be asked by human beeing with fuzzy
information needs in mind. Instead it is just one service to tell you
where an already identifier manifestation can be found. If you still
don't know which exact manifestation (for instance you don't mind which
edition of a book), then the holding service needs to be queried for
each possible manifestation.


I agree -- and I'm only looking at the API side of things.  In building
the Virtual Solar Observatory <http://virtualsolar.org/>, we ran into the
problem that we didn't clearly define what constituted a 'record' in
response of a query.  And the scientists still can't agree, as it affects
what type of questions can be easily answered ... more granular allows
more specific questions, but less granular makes it easy for the scientist
to filter down the result set to determine their needs.

Those people writing user interfaces to make use of the API need to know
what granularity is being returned by the API, and if necessarily,
de-duplicate to make it less granular and more in line with what the user
expects.

So, for instance, to answer the following types of questions, we need
different granularity:

       What stories do you have that I might be interested in?
               (only need 'work')
       What stories do you have that I can understand?
               (language is significant -- need 'expression')
       What stories do you have that are accessible to me?
               (may need characteristics of the packaging, need
               'manifestation')
       What stories do you have that are currently available to me?
               (need attributes of specific physical items)

Technically, we may only need those levels for answering the question, and
then return details at a higher granularity (eg, as I said 'stories', work
may be sufficient)

We start needing the other levels of detail when a person is trying to
make decisions as they drill down in granlarity.

       I've identified that I'd like to read <Work>, what media and/or
       translation is it available in?
               (need a list of expressions, or possibly manifestations)
       I've identified that I'm interested in <expression>, what are my
       options for physical packaging?
               (need a list of manifestations)
       I've identified that I'm interested in <Manifestation>, where can
       I get it from?
               (need a list of items)

I've been trying to keep the terms rather generic, so they fix the use
cases that I'm dealing with, but as an example for say, someone looking to
get a specific movie:

       Do you have the movie w/ english subtitles or dubbed over
       so I can understand it?
       Is it available on VHS, so I can actually watch it?
       Where do I have to go to get it?

In my specific case, the questions are:
       Is the data in units that are meaningful to me?
               (some are raw sensor recordings, which require calibration
               software that not everyone would have, and even once
               calibrated, the data may not be comparable to other
               instruments;  sometimes lossy compression is acceptable,
               other times, it isn't, depending on what the data is being
               used for)
       Is the data in a format that my tools can make use of?
               (must have the necessary metadata, some tools can't deal
               with 4 dimensional data and need individual data cubes,
               not all tools can read FITS / CDF / HDF / NetCDF /etc.)
       How long will it take me to get the data?
               (if it's available locally, get it locally before trying
               to get it from some other mirror in Europe or Asia)

(and, to make things more complex, I think there's a group 1 entity that's
missing in FRBR -- the concept of 'text' in the scope of the specific
words that are used but without the formatting, so I can de-duplicate at
the translation level, rather than only once pagination and other
typesetting have been applied, at the Expression level.  The best
correlation I can come up with to the problem in terms of bibliographic
records is the question 'Do you have a copy of the King James Bible?')


I don't see the problem here. The King James Bible is a frbr:expression
of the frbr:work Bible or a frbr:work of its own (I never really catched
the difference between frbr:work and frbr:expression). If you ask for
the text of the King James Bible then you ask for a frbr:item of that
work/expression with specific additional characteristics of containing
no formatting but only the text. At http://ebible.org/bible/kjv/ you can
download the King James Bible in different formats - each file is a
frbr:item of its own.


Actually, that's what I thought, too, until I was talking to people at the
last ASIS&T annual meeting, and a few were insistant that a translation
was a new work, and not just a new expression.  As you said you weren't
sure, I'm guessing there's probably more debate on that specific issue
than I realize, as I'm not directly active in the FRBR discussions.

Now, there is mention that expression "excludes aspects of physical form,
such as typeface and page layout if they are not integral to the
intellectual or artistic realization of the work as such", but we then get
to the issue of what is 'integral'.

One example I was given was that that of XML formatted documents vs. a
plain text document.  Their argument was that it wasn't on the excluded
list (typeface and page layout), and so therefore made a new expression.
I'm willing to assume that it's actually a notation of formatting, which
is excluded ... if you're adding markup after the fact to an formatted
text.  If you remove formatting from a marked up text, you may be removing
information that is necessary to allow the document to the understandable
(or at least, less misunderstood) by a wider audience.

Expression also includes "mode or medium of expression", and so books on
tape are a seperate expression (and some might argue seperate work), of
the printed form of the work.

If the people I was talking to are just the dissidents in the community,
and most people agree that translations are an expression, then that
greatly solves the issues I've been having with trying to fit my concepts
/ objects to FRBR.

I think the problem of applying FRBR lies in the lack of authority
files. There is no easy way to link

http://ebible.org/bible/kjv/kjvtxt.zip (Plain text version)

with the general concept of "The King James Bible" because there is no
registry of frbr:work/expressions. In some cases LibraryThing does a
good job to define works, in other Wikipedia may be a better choice.



We're running into the same issue with data ... I think we're going to
have to track provenance information, and have reformatting software
insert identifiers so we can track individual items to their origin.

The question 'Do you have a copy of the King James Bible?' can be
answered very well with FRBR in two steps:


[trimmed]

If people are going to classify translations as new expressions, that's
work, as that's the exact sort of thing I was hoping for ... I guess I
just need to wait until things finally get implemented, and we can see how
many people subscribe to the 'translation is a new work' belief.

... anyway, the point is -- you have to define 'holding', or you can't be
assured that the response to your request is the correct granularity of
information to answer the question you're trying to ask.


Ok, then I'd define a holding an instance of frbr:item with the
properties "location" (a building, an institution, an URL...),
"identifier" (call-number, item-number, URL...) and "availability"
(available, next week, only on campus, free for download...). As shown
in my ad-hoc example "location" can be nested, but that's not the point.
Defining holding is not the problem - you just have to look how
holdings are *practically* used in libraries (instead of starting a
theoretical discussion). The problem is more how to get the data out of
library systems.


I probably should stop talking to the theoretical and research folks ...
it did seem much easier when I stuck with the 'functional' in FRBR, and
was just looking at what it would take to implement the model for the
archives I manage ... which gets us back to the practical part:

You need to come to a shared understanding of what you're returning in
response to a 'holdings' request, or the response isn't meaningful ...
which you had already stated, and I probably just confused the matter
further, but was agreeing with you.

-----
Joe Hourcle

Re: [CODE4LIB] protocol for obtaining holdings not on/from OCLC

Reply via email to