Re: [CODE4LIB] automatic greeking of sample files

2011-12-12 Thread Michael B. Klein
I've altered my previous function (https://gist.github.com/1468557) into
something that's pretty much a straight letter-substitution cipher. It
could be turned back into plaintext pretty easily by someone who really
wanted to (by using frequency analysis and other hints like single-letter
words), but I can't imagine anyone going to the trouble over finding aids.
:) This keeps words (and therefore word frequency/distribution) consistent,
even across changes in case. But if you really want it to index
realistically, it would need to be altered to leave common stems (-s, -ies,
-ed, -ing, etc.) alone (assuming the indexer uses some sort of stemming
algorithm).

On Mon, Dec 12, 2011 at 12:06 PM, Brian Tingle <
brian.tingle.cdlib@gmail.com> wrote:

> On Mon, Dec 12, 2011 at 10:56 AM, Michael B. Klein  >wrote:
>
> > Here's a snippet that will completely randomize the contents of an
> > arbitrary string while replacing the general flow (vowels replaced with
> > vowels, consonants replaced with consonants (with case retained in both
> > instances), digits replaced with digits, and everything else is left
> alone.
> >
> > https://gist.github.com/1468557  
>
>
> I like the way the output looks; but one problem with the random output is
> that the same word might come out to different values.  The distribution of
> unique words would also be affected, not sure if that would
> impact relevance/searching/index size.  Also, I was sort of hoping to be
> able to have some sort of browsing, so I'm looking for something that is
> like a pronounceable hash one way hash.  Maybe if I take the md5 of the
> word; and then use that as the seed for random, and then run
> your algorithm then NASA would always "hash" to the same thing?
>
> Potential contributors of specimens would have to be okay with the fact
> that a determined person could recreate their original records.  The goal
> is that an end user who might stumble across a random XTF tutorial
> installation would not mistake what they are seeing for a real collection
> description.
>
> Hopefully nothing transforms to a swear word, I guess that is a problem
> with pig latin as well...
>
> Thanks for the feedback and the suggestion.  I'll play with this some
> tonight and see if setting the seed based on the input word works to get
> the same pseudo-random result, seems like it should.
>


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Peter Noerr
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
> Coyle
> Sent: Sunday, December 11, 2011 3:47 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
> 
> Quoting Richard Wallis :
> 
> 
> > You get the impression that the BL "chose a subset of their current
> > bibliographic data to expose as LD" - it was kind of the other way around.
> > Having modeled the 'things' in the British National Bibliography
> > domain (plus those in related domain vocabularis such as VIAF, LCSH,
> > Geonames, Bio, etc.), they then looked at the information held in
> > their [Marc] bib records to identify what could be extracted to populate it.
> 
> Richard, I've been thinking of something along these lines myself, especially 
> as I see the number of
> "translating X to RDF" projects go on. I begin to wonder what there is in 
> library data that is
> *unique*, and my conclusion is: not much. Books, people, places, topics: they 
> all exist independently
> of libraries, and libraries cannot take the credit for creating any of them. 
> So we should be able to
> say quite a bit about the resources in libraries using shared data points -- 
> and by that I mean, data
> points that are also used by others. So once you decide on a model (as BL 
> did), then it is a matter of
> looking *outward* for the data to re-use.

Trying to synthesize what Karen, Richard and Simon have bombarded us with here, 
leads me to conclude that linking to existing (or to be created) external data 
(ontologies and representations) is a matter of: being sure what you’re the 
system's current user's context is, and being able to modify the external data 
brought into the users virtual EMU(see below *** before reading further). I 
think Simon is right that "records" will increasingly become virtual in that 
they are composed as needed by this user for this purpose at this time. We 
already see this in practice in many uses from adding cover art to book MARC 
records to just adding summary information to a "management level" report. 
Being able to link from a "book" record to foaf:person and a bib:person records 
and extract data elements from each as they are needed right now should not be 
too difficult. As well as a knowledge of the current need, it requires a 
semantically based mapping of the different elements of those!
  "people" representations. The neat part is that the total representation for 
that person may be expressed through both foaf: and bib: facets from a single 
EMU which contains all things known about that person, and so our two requests 
for linked data may, in fact should, be mining the same resource, which will 
translate the data to the format we ask for each time, and then we will combine 
those representations back to a collapsed single data set.

I think Simon (maybe Richard, maybe all of you) was working towards a single 
unique EMU for the entity which holds all unique information about it for a 
number of different uses/scenarios/facets/formats. Of course deciding on what 
is unique and what is obtained from some more granular breakdown is another 
issue. (Some experience with this "onion skin" modeling lies deep in my past, 
and may need dredging up.)

It is also important, IMHO, to think about the repository from of entity data 
(the EMU) and the transmission form (the data sent to a requesting system when 
it asks for "foaf:person" data). They are different and have different 
requirements. If you are going to allow all these entity data elements to be 
viewed through a "format filter" then we have a mixed model, but basically a 
whole-part between the EMU and the transmission form. (e.g. the full data set 
contains the person's current address, but the transmitted response sends only 
the city). Argue amongst yourselves about whether an address is a separate 
entity and is linked to or not - it makes a simple example to consider it as 
part of the EMU.

All of this requires that we think of the web of data as being composed not of 
static entities with a description which is fixed at any snapshot in time, but 
being dynamic in that what two users see of the same entity maybe different at 
exactly the same instant. So not only a descriptive model structure, but also a 
set of semantic mappings, a context resolution transformation, and the system 
to implement it each time a link to related data is followed.

> 
> I maintain, however, as per my LITA Forum talk [1] that the subject headings 
> (without talking about
> quality thereof) and classification designations that libraries provide are 
> an added value, and we
> should do more to make them useful for discovery.
> 
> 
> >
> > I know it is only semantics (no pun intended), but we need to stop
> > using the word 'record' when talking about the future description of 
> > 'things' or
> > entities that are then linked together.   That word has so many built in
> 

Re: [CODE4LIB] automatic greeking of sample files

2011-12-12 Thread Brian Tingle
On Mon, Dec 12, 2011 at 12:27 PM, Nate Vack  wrote:

> On Mon, Dec 12, 2011 at 2:06 PM, Brian Tingle
>  wrote:
>
> > Potential contributors of specimens would have to be okay with the fact
> > that a determined person could recreate their original records.
>
> To make things simpler, you might just see how many contributors would
> just be OK with the original records, and skip the obfuscation.


true; but I'm also worried about end user support questions if we end up
have something like an ead-demo.xtf.cdlib.org

plus I'm also using this as an excuse to play with nltk (natural language
toolkit) and learn more python

but yes, I'm sure I'm prematurely optimizing this problem

On Mon, Dec 12, 2011 at 12:48 PM, Joe Hourcle  wrote:

> If the list of missions / agencies / etc is rather small, it'd be possible
> to
> just come up with a random list of nouns, and make a sort of secret
> decoder ring, assigning each mission name that needs to be replaced
> with a random (but consistent) word.


This is a great idea.  I think if I reset the pseudo-random seed based on
the input; then I don't even have to worry about keeping a decoder ring,
and it will work with any noun.  As long as the results look so silly that
no end user might mistake it for real this might work.

maybe I'll create an option switch for the text replacement method;
pig-latin, vowel/consonant-sensitive random letters, or random dictionary
word


[CODE4LIB] formatting citations question

2011-12-12 Thread Shearer, Timothy J
Hi All,

We have a popular service:

http://www.lib.unc.edu/house/citationbuilder/

Essentially it provides citation "genre" (journal article, chapter,
monograph) based web forms and allows users to fill them in and then see a
citation formatted in various styles.

Regardless of how folks feel about that as a service, I'm interested in
exploring better ways to do it, and why reinvent the wheel?

I'm looking for perspectives on (or existing projects that use)
citeproc-js to process web form input (and potentially also to unpack and
style COinS).

Thanks for any advice or pointers.

Tim


Re: [CODE4LIB] automatic greeking of sample files

2011-12-12 Thread Joe Hourcle
On Dec 12, 2011, at 3:06 PM, Brian Tingle wrote:

> On Mon, Dec 12, 2011 at 10:56 AM, Michael B. Klein wrote:
> 
>> Here's a snippet that will completely randomize the contents of an
>> arbitrary string while replacing the general flow (vowels replaced with
>> vowels, consonants replaced with consonants (with case retained in both
>> instances), digits replaced with digits, and everything else is left alone.
>> 
>> https://gist.github.com/1468557  
> 
> 
> I like the way the output looks; but one problem with the random output is
> that the same word might come out to different values.  The distribution of
> unique words would also be affected, not sure if that would
> impact relevance/searching/index size.  Also, I was sort of hoping to be
> able to have some sort of browsing, so I'm looking for something that is
> like a pronounceable hash one way hash.  Maybe if I take the md5 of the
> word; and then use that as the seed for random, and then run
> your algorithm then NASA would always "hash" to the same thing?

If the list of missions / agencies / etc is rather small, it'd be possible to
just come up with a random list of nouns, and make a sort of secret
decoder ring, assigning each mission name that needs to be replaced
with a random (but consistent) word.

I just tend to replace all of my mission / spacecraft / instrument acronyms
with 'BOGUS' when I have to do similar stuff to generate records when
we're testing data systems, but I tend to just have the acronyms, not
the full spelled out names (which are looked up from the acronyms),
and I don't have large amounts of free text to worry about.

-Joe


Re: [CODE4LIB] automatic greeking of sample files

2011-12-12 Thread Nate Vack
On Mon, Dec 12, 2011 at 2:06 PM, Brian Tingle
 wrote:

> Potential contributors of specimens would have to be okay with the fact
> that a determined person could recreate their original records.

To make things simpler, you might just see how many contributors would
just be OK with the original records, and skip the obfuscation.

-n


Re: [CODE4LIB] automatic greeking of sample files

2011-12-12 Thread Brian Tingle
On Mon, Dec 12, 2011 at 10:56 AM, Michael B. Klein wrote:

> Here's a snippet that will completely randomize the contents of an
> arbitrary string while replacing the general flow (vowels replaced with
> vowels, consonants replaced with consonants (with case retained in both
> instances), digits replaced with digits, and everything else is left alone.
>
> https://gist.github.com/1468557  


I like the way the output looks; but one problem with the random output is
that the same word might come out to different values.  The distribution of
unique words would also be affected, not sure if that would
impact relevance/searching/index size.  Also, I was sort of hoping to be
able to have some sort of browsing, so I'm looking for something that is
like a pronounceable hash one way hash.  Maybe if I take the md5 of the
word; and then use that as the seed for random, and then run
your algorithm then NASA would always "hash" to the same thing?

Potential contributors of specimens would have to be okay with the fact
that a determined person could recreate their original records.  The goal
is that an end user who might stumble across a random XTF tutorial
installation would not mistake what they are seeing for a real collection
description.

Hopefully nothing transforms to a swear word, I guess that is a problem
with pig latin as well...

Thanks for the feedback and the suggestion.  I'll play with this some
tonight and see if setting the seed based on the input word works to get
the same pseudo-random result, seems like it should.


Re: [CODE4LIB] automatic greeking of sample files

2011-12-12 Thread Michael B. Klein
Hi Brian,

Your contributors might not consider Pig Latin, or anything else that can
be easily turned back into plaintext, to be "not releasing their actual
records." :-)

Here's a snippet that will completely randomize the contents of an
arbitrary string while replacing the general flow (vowels replaced with
vowels, consonants replaced with consonants (with case retained in both
instances), digits replaced with digits, and everything else is left alone.

https://gist.github.com/1468557

Here's your NASA sample run through the randomizer:

Vny RUPY Xsase Pwuccpo Lnipbaxjew fipewsof eqfugvof if Xeleufe 60, 1295
wtos Mvimo Jlehcve Lbobvezbyh vlozi odohl 77 cyfuzbq ilne ybl sponsf,
meojacz gu cmi piyngf ed abr fotor gloc cumcetj. Ruzildasfebaod if fdu
ejsosa rumozzi ginaq arhan or A-pont kaon ew eqv jejlk vutuq kalsaj roumhyl
teopyf is midqokz. Kda mitoxhuh rugoxhal on pxu pelqeseul az msu Tawivg
Luwjutmaol, i mqubyip wulvyffaak evviivhek qe Afykox Cfaron Mkefyfipq
Kybuvz Riufyl ba awwevrogixe bde uhliwekp. Hsu Gqugydatgyyp Qemgybmuix
diytr tvix VYXE'h irjybefakiyzil cibkeco udx numojuaf-pogezn dquziqpyb fod
heip a fee lannjuluxymk qejvet la vmy ymriqexc. BUJI fegucuzz syj wviwx
wmin cyvvgintoj Jufhyq Gnoeham'v dosyzv ar xzy detib xyzvyf raazkapk
lizniutyp u cypimsiufte zetesjzesmam dgyj ag cki U-juzrm, dys gnai jausul
gi iqlbyhf es ksumapfu. Bsau ittu qojsarahlih mozpyhbb dpon okxotuosd ebuih
cde xoqhewd ow koahznygl xuwoh by xce huf jujjybexohyp og xjoc gagnysx.

On Fri, Dec 9, 2011 at 3:17 PM, BRIAN TINGLE <
brian.tingle.cdlib@gmail.com> wrote:

> Hi,
>
> I'm now in the group that produces XTF, and for XTF4.0, I'm thinking about
> updating the EAD XSLT based on the Online Archive of California's
> stylesheets.
>
> For our EAD samples that we distribute with the XTF tutorial, we are using
> 6 EAD files from the library of congress (which presumably are public
> domain).
>
> I'd like to start of a collection of pathological EAD examples that we
> have the rights to redistribute with the XTF tutorials and to use for
> testing.
>
> Anticipating that potential contributors might not want to release their
> actual records for inclusion in an open source project; I hacked a little
> script to systematically change names and nouns to pig latin
>
> https://gist.github.com/1429538
>
> Here is a sample run;
>
> Input: (from http://www.oac.cdlib.org/findaid/ark:/13030/kt3580374v/ )
>
> The NASA Space Shuttle Challenger disaster occurred on January 28, 1986
> when Space Shuttle Challenger broke apart 73 seconds into its flight,
> leading to the deaths of its seven crew members. Disintegration of the
> entire vehicle began after an O-ring seal in its right solid rocket booster
> failed at liftoff. The disaster resulted in the formation of the Rogers
> Commission, a special commission appointed by United States President
> Ronald Reagan to investigate the accident. The Presidential Commission
> found that NASA's organizational culture and decision-making processes had
> been a key contributing factor to the accident. NASA managers had known
> that contractor Morton Thiokol's design of the solid rocket boosters
> contained a potentially catastrophic flaw in the O-rings, but they failed
> to address it properly. They also disregarded warnings from engineers about
> the dangers of launching posed by the low temperatures of that morning.
>
> output:
>
> The Nasaay Acespay Uttleshay Allengerchay isasterday occurred on Anuaryjay
> 28, 1986 when Acespay Uttleshay Allengerchay okebray apartway 73 econdsays
> into its flight, leading to the eathdays of its seven ewcray embermays.
> Isintegrationday of the entire ehiclevay began after an O-ring ealsay in
> its ightray solid ocketray oosterbay failed at iftofflay. The isasterday
> resulted in the ormationfay of the Ogersray Ommissioncay, a special
> ommissioncay appointed by Itedunay States Esidentpray Onaldray Eaganray to
> investigate the accidentway. The Esidentialpray Ommissioncay found that
> Nasaay's organizational ulturecay and decision-making ocessprays had been a
> key ontributingcay actorfay to the accidentway. Nasaay anagermays had known
> that ontractorcay Ortonmay Iokolthay's esignday of the solid ocketray
> oosterbays contained a potentially catastrophic awflay in the ingO-rays,
> but they failed to addressway it properly. They also disregarded arningways
> from engineerways about the angerda!
>  ys of launching posed by the low emperaturetays of that orningmay.
>
> Does anyone have any thoughts or feedback on this?  Is this totally silly?
>  Is there something besides pig latin that I could transform the words to?
>  Any obvious ways I could improve the python?
>


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Karen Coyle

Quoting Owen Stephens :


To be provocative - has the time come for us to abandon the idea  
that 'libraries' act as one where cataloguing is concerned, and our  
metadata serves the same purpose in all contexts? (I can't decide if  
I'm serious about this or not!)


I'm having "deep thoughts" about the logic of our current concept of  
cataloging, but nothing clear enough to even blog about. Let me just  
say that I'm not at all sure what we would lose if we didn't do  
"cataloging" as it is known today.


kc



Owen



Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 11 Dec 2011, at 23:47, Karen Coyle wrote:


Quoting Richard Wallis :



You get the impression that the BL "chose a subset of their current
bibliographic data to expose as LD" - it was kind of the other way around.
Having modeled the 'things' in the British National Bibliography domain
(plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
Bio, etc.), they then looked at the information held in their [Marc] bib
records to identify what could be extracted to populate it.


Richard, I've been thinking of something along these lines myself,  
especially as I see the number of "translating X to RDF" projects  
go on. I begin to wonder what there is in library data that is  
*unique*, and my conclusion is: not much. Books, people, places,  
topics: they all exist independently of libraries, and libraries  
cannot take the credit for creating any of them. So we should be  
able to say quite a bit about the resources in libraries using  
shared data points -- and by that I mean, data points that are also  
used by others. So once you decide on a model (as BL did), then it  
is a matter of looking *outward* for the data to re-use.


I maintain, however, as per my LITA Forum talk [1] that the subject  
headings (without talking about quality thereof) and classification  
designations that libraries provide are an added value, and we  
should do more to make them useful for discovery.





I know it is only semantics (no pun intended), but we need to stop using
the word 'record' when talking about the future description of 'things' or
entities that are then linked together.   That word has so many built in
assumptions, especially in the library world.


I'll let you battle that one out with Simon :-), but I am often at  
a loss for a better term to describe the unit of metadata that  
libraries may create in the future to describe their resources.  
Suggestions highly welcome.


kc
[1] http://kcoyle.net/presentations/lita2011.html





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet






--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


[CODE4LIB] units of metadata, was Namespace management, was Models of MARC in RDF

2011-12-12 Thread [Your Name]
On Dec 11, 2011, at 11:00 PM, Karen Coyle wrote:

> I'll let you battle that one out with Simon :-), but I am often at a loss for 
> a better term to describe the unit of metadata that libraries may create in 
> the future to describe their resources. Suggestions highly welcome.


I'm sure you're aware of these, but for general edification here are some 
possible ways to think about an "implicit record":

Concise Bounded Description: http://www.w3.org/Submission/CBD, or better for 
libraries IMHO, Symmetric Concise Bounded Description: 
http://www.w3.org/Submission/CBD/#scbd

The Minimum Self-contained Graph ("MSG"), details of which are available in 
"Signing individual fragments of an RDF graph." from (ACM WWW '05), as well as 
"RDFSync: efficient remote synchronization of RDF models" from 
(ISWC2007+ASWC2007).
http://www.www2005.org/cdrom/docs/p1020.pdf
http://data.semanticweb.org/pdfs/iswc-aswc/2007/ISWC2007_RT_Tummarello(1).pdf.

RDF Molecules, details in "Tracking RDF Graph Provenance using RDF Molecules," 
from (ISWC '05).
http://aisl.umbc.edu/resources/178.pdf

All of these are basically defined as pieces of larger graphs, although they 
can be considered as conditions of some kind of "validity" for a graph. I 
suspect that part of the hurdle for our community in moving to new patterns of 
work is the gap between current workflows (which create records) and future 
workflows (which may enrich shared graphs by much smaller increments then 
current notions of "record"). "[T]he unit of metadata that libraries may create 
in the future" as the unit in a given workflow may be only as large as the 
triple.

---
A. Soroka
Online Library Environment
the University of Virginia Library




On Dec 11, 2011, at 11:00 PM, Karen Coyle wrote:

>> I know it is only semantics (no pun intended), but we need to stop using the 
>> word 'record' when talking about the future description of 'things' or 
>> entities that are then linked together.   That word has so many built in
>> assumptions, especially in the library world.
> 
> I'll let you battle that one out with Simon :-), but I am often at a loss for 
> a better term to describe the unit of metadata that libraries may create in 
> the future to describe their resources. Suggestions highly welcome.


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Alexander Johannesen
"Richard Wallis"  wrote:
> Collection of triples?

Yes, no baggage there ... :) Some of us are doing this completely without a
single triplet, so I'm not sure it is accurate or even politically correct.
*hehe*

> A classic example of only being able to describe/understand the future in
> the terms of your past experience.

Yes, exactly. Although, having said that, I'm excited that the library
world is finally taking the semantic challenge seriously. It's taken quite
a number of years, but slowly there's a few drips and draps happening.
Here's to hoping that there's a fluse somewhere about to open fully, and
maybe the RDA vehicle have proper wheels? (Didn't the last time I checked,
but that's admittedly a couple of years back. I hear they at least got new
suspension?)

Regards,

Alex


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Richard Wallis
On 12 December 2011 11:16, Alexander Johannesen <
alexander.johanne...@gmail.com> wrote:

> "Richard Wallis"  wrote:
> > Your are not the only one who is looking for a better term for what is
> > being created - maybe we should hold a competition to come up with one.
>
> A "named graph" gets thrown around a lot, and even though this is
> technically correct, it's neither nice nor sexy.
>

It also carries lots of baggage from the Linked Data/Triple store
communities that would get in the way.

>
> In my past a "bucket" was much used, as you can easily thrown things in or
> take it out (as opposed to the more terminal record being set), however
> people have a problem with the conceptual size of said bucket, which more
> or less summarizes why this term is so hard to pin down.
>

Yes, most would assume that a bucket would be the place to put their [think
of a better word than] records.


>
> I have, however, seen some revert the old RDBMS world of "rows", as they
> talk about properties on the same line, just thinking the line to be more
> flexible than what it used to be, but we'll see if it sticks around.
>

Collection of triples?


> Personally I think the problem is that people *like* the idea of a closed
> little silo that is perfectly contained, no matter if it is technically
> true or not, and therefore futile. This is also why, I think, it's been so
> hard to explain to more traditional developers the amazing advantages you
> get through true semantic modelling; people find it hard to let go of a
> pattern that has helped them so in the past.
>

A classic example of only being able to describe/understand the future in
the terms of your past experience.


> Breaking the meta data out of the wonderful constraints of a MARC record?
> FRBR/RDA will never fly, at least not until they all realize that the
> constraints are real and that they truly and utterly constrain not just the
> meta data but the future field of librarying ... :)
>

:-)

~Richard.
-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Alexander Johannesen
"Richard Wallis"  wrote:
> Your are not the only one who is looking for a better term for what is
> being created - maybe we should hold a competition to come up with one.

A "named graph" gets thrown around a lot, and even though this is
technically correct, it's neither nice nor sexy.

In my past a "bucket" was much used, as you can easily thrown things in or
take it out (as opposed to the more terminal record being set), however
people have a problem with the conceptual size of said bucket, which more
or less summarizes why this term is so hard to pin down.

I have, however, seen some revert the old RDBMS world of "rows", as they
talk about properties on the same line, just thinking the line to be more
flexible than what it used to be, but we'll see if it sticks around.
Personally I think the problem is that people *like* the idea of a closed
little silo that is perfectly contained, no matter if it is technically
true or not, and therefore futile. This is also why, I think, it's been so
hard to explain to more traditional developers the amazing advantages you
get through true semantic modelling; people find it hard to let go of a
pattern that has helped them so in the past.

Breaking the meta data out of the wonderful constraints of a MARC record?
FRBR/RDA will never fly, at least not until they all realize that the
constraints are real and that they truly and utterly constrain not just the
meta data but the future field of librarying ... :)

Regards,

Alex


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Richard Wallis
On 11 December 2011 23:47, Karen Coyle  wrote:

> Quoting Richard Wallis :
>
>
>  You get the impression that the BL "chose a subset of their current
>> bibliographic data to expose as LD" - it was kind of the other way around.
>> Having modeled the 'things' in the British National Bibliography domain
>> (plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
>> Bio, etc.), they then looked at the information held in their [Marc] bib
>> records to identify what could be extracted to populate it.
>>
>
> Richard, I've been thinking of something along these lines myself,
> especially as I see the number of "translating X to RDF" projects go on. I
> begin to wonder what there is in library data that is *unique*, and my
> conclusion is: not much. Books, people, places, topics: they all exist
> independently of libraries, and libraries cannot take the credit for
> creating any of them. So we should be able to say quite a bit about the
> resources in libraries using shared data points -- and by that I mean, data
> points that are also used by others. So once you decide on a model (as BL
> did), then it is a matter of looking *outward* for the data to re-use.
>

Yes!



>
> I maintain, however, as per my LITA Forum talk [1] that the subject
> headings (without talking about quality thereof) and classification
> designations that libraries provide are an added value, and we should do
> more to make them useful for discovery.
>
>
The wider world is always looking for good ways to categorise things.  The
library community should make it easy for others to utilise their rich
heritage of such things. LCSH is an obvious candidate, so is VIAF amongst
others.  The easier we make it, the more uptake there will be and the more
inbound links in to library resources we will get.  By easier, I am
suggesting that efforts to map these library concepts (where they fit) to
their wider world equivalents found in places like Dbpeadia, New York
Times, and Geonames, will greatly enhance the use and visibility of library
resources.


>
>
>> I know it is only semantics (no pun intended), but we need to stop using
>> the word 'record' when talking about the future description of 'things' or
>> entities that are then linked together.   That word has so many built in
>> assumptions, especially in the library world.
>>
>
> I'll let you battle that one out with Simon :-), but I am often at a loss
> for a better term to describe the unit of metadata that libraries may
> create in the future to describe their resources. Suggestions highly
> welcome.
>

Your are not the only one who is looking for a better term for what is
being created - maybe we should hold a competition to come up with one.



-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Owen Stephens
On 11 Dec 2011, at 23:30, Richard Wallis wrote:

> 
> There is no document I am aware of, but I can point you at the blog post by
> Tim Hodson [
> http://consulting.talis.com/2011/07/british-library-data-model-overview/]
> who helped the BL get to grips with and start thinking Linked Data.
> Another by the BL's Neil Wilson [
> http://consulting.talis.com/2011/10/establishing-the-connection/] filling
> in the background around his recent presentations about their work.

Neil Wilson at the BL has indicated a few times that in principle the BL has no 
problem sharing the software they used to extract the relevant data from the 
MARC records, but that there are licensing issues around the s/w due to the use 
of a proprietary compiler (sorry, I don't have any more details so I can't 
explain any more than this). I'm not sure whether this extends to sharing the 
source that would tell us what exactly was happening, but I think this would be 
worth more discussion with Neil - I'll try to pursue it with him when I get a 
chance

Owen


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Owen Stephens
The other issue that the 'modelling' brings (IMO) is that the model influences 
use - or better the other way round, the intended use and/or audience should 
influence the model. This raises questions for me about the value of a 
'neutral' model - which is what I perceive libraries as aiming for - treating 
users as a homogenous mass with needs that will be met by a single approach. 
Obviously there are resource implications to developing multiple models for 
different uses/audiences, and once again I'd argue that an advantage of the 
linked data approach is that it allows for the effort to be distributed amongst 
the relevant communities.

To be provocative - has the time come for us to abandon the idea that 
'libraries' act as one where cataloguing is concerned, and our metadata serves 
the same purpose in all contexts? (I can't decide if I'm serious about this or 
not!)

Owen



Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 11 Dec 2011, at 23:47, Karen Coyle wrote:

> Quoting Richard Wallis :
> 
> 
>> You get the impression that the BL "chose a subset of their current
>> bibliographic data to expose as LD" - it was kind of the other way around.
>> Having modeled the 'things' in the British National Bibliography domain
>> (plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
>> Bio, etc.), they then looked at the information held in their [Marc] bib
>> records to identify what could be extracted to populate it.
> 
> Richard, I've been thinking of something along these lines myself, especially 
> as I see the number of "translating X to RDF" projects go on. I begin to 
> wonder what there is in library data that is *unique*, and my conclusion is: 
> not much. Books, people, places, topics: they all exist independently of 
> libraries, and libraries cannot take the credit for creating any of them. So 
> we should be able to say quite a bit about the resources in libraries using 
> shared data points -- and by that I mean, data points that are also used by 
> others. So once you decide on a model (as BL did), then it is a matter of 
> looking *outward* for the data to re-use.
> 
> I maintain, however, as per my LITA Forum talk [1] that the subject headings 
> (without talking about quality thereof) and classification designations that 
> libraries provide are an added value, and we should do more to make them 
> useful for discovery.
> 
> 
>> 
>> I know it is only semantics (no pun intended), but we need to stop using
>> the word 'record' when talking about the future description of 'things' or
>> entities that are then linked together.   That word has so many built in
>> assumptions, especially in the library world.
> 
> I'll let you battle that one out with Simon :-), but I am often at a loss for 
> a better term to describe the unit of metadata that libraries may create in 
> the future to describe their resources. Suggestions highly welcome.
> 
> kc
> [1] http://kcoyle.net/presentations/lita2011.html
> 
> 
> 
> 
> 
> -- 
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet