Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Peter, On 12 December 2011 22:11, Peter Noerr pno...@museglobal.com wrote: Trying to synthesize what Karen, Richard and Simon have bombarded us with here, leads me to conclude that linking to existing (or to be created) external data (ontologies and representations) is a matter of: being sure what you’re the system's current user's context is, and being able to modify the external data brought into the users virtual EMU(see below *** before reading further). Sorry for the bombarding ;-) being sure what you’re the system's current user's context is - sounds like a nice idea, but when you are publishing data you have little control, and even less knowledge, of the consuming 'user' and their context. Taking things to the next level, by building services and applications for users, you hopefully will have some understanding of the virtual and actual users' contexts and can take [what I like to call editorial] decisions about how much data in what format to deliver to them, and which links to follow to enrich your service. So, back down at the data level, model your domain to include all the information you are aware of for the entities you are describing, plus link them to other domains that can enrich those descriptions. Leave it to the consumers of your data to decide what is best for them in their context. I think Simon is right that records will increasingly become virtual in that they are composed as needed by this user for this purpose at this time. Yes - you could envisage, for some domains, a minimalistic description of their resource could be sufficient in the form of a single triple: http://mylib.org/resource/12345 owl:sameAs http://bnb.data.bl.uk/id/resource/008740700 . I think Simon (maybe Richard, maybe all of you) was working towards a single unique EMU for the entity which holds all unique information about it for a number of different uses/scenarios/facets/formats. Of course deciding on what is unique and what is obtained from some more granular breakdown is another issue. (Some experience with this onion skin modeling lies deep in my past, and may need dredging up.) I am suggesting that you in your domain/catalog/library would probably assign a unique identifier, in your domain, for each of the things you describe: http://mylib.org/resource/12345 http://mylib.org/person/CarpenterEdward1910-1998 Describe those things: http://mylib.org/resource/008740700 rdf:type bibo:Book . http://mylib.org/person/CarpenterEdward1910-1998 foaf:name Edward Carpenter . Describe the relationships between those things: http://mylib.org/resource/008740700 dct:creator http://mylib.org/person/CarpenterEdward1910-1998 . Then link them to external descriptions of the same concepts: http://mylib.org/resource/12345 owl:sameAs http://bnb.data.bl.uk/id/resource/008740700 . http://mylib.org/person/CarpenterEdward1910-1998 owl:sameAs http://viaf.org/viaf/53127337 . That way you end up with internal identifiers that you can link to, from things like comments, circulation records, physical location information, etc. These are then linked out to distributed descriptions which you, or consumers of your data, can then merge with your data to provide richer information. I know the above examples are a bit simplistic, but nevertheless it could be near good-enough for some use cases. *** I suggest (and use above) the Entity Metadata Unit = EMU. This contains the totality of unique information stored about this entity in this single logical location. In my current location, and the current economic climate, I am wary of an acronym the same as European Monetary Union. ;-) However, I think you are thinking in the right direction - I am resigning myself to just using the word 'description'. ~Richard. -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Simon Spero s...@unc.edu: On Tue, Dec 13, 2011 at 8:58 AM, Richard Wallis richard.wal...@talis.comwrote: However, I think you are thinking in the right direction - I am resigning myself to just using the word 'description'. Q: In your definition, can *descriptions *be put* * into 1:1 correspondence with records (where a record is a atomic asserted set of propositions about a resource)? Yes, I realize that you were asking Richard, but I'm a bit forward, as we know. I do NOT see a description as atomic in the sense that a record is atomic. A record has rigid walls, a description has permeable ones. A description always has the POTENTIAL to have a bit of unexpected data added; a record cuts off that possibility. That said, I am curious about the permeability of the edges of a named graph. I don't know their degree of rigidity in terms of properties allowed. kc Simon -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Being no longer in Europe, I had completely missed the currently hot potato definition of EMU. But it had a nice feel to it sigh I agree with Karen below that a record seems more bounded and static, whereas a description varies according to need. And that is the distinction I was trying to get at: that the item stored in some database is everything unique about that entity - and is static, until some data actually changes, whereas the description is built at run time for the user and may contain some data from the item record, and some aggregated from other, linked, item records. The records all have long term existence in databases and the like, whereas the description is a view of all that stored data appropriate for the moment. It will only be stored as a processing intermediate result (as a record, since its contents are now fixed), and not long term, since it would be broken up to bits of entity data and stored in a distributed linked fashion (much like, as I understand it, the BL did when reading MARC records and storing them as entity updates.) Having said all that, I don't like the term description as it carries a lot of baggage, as do all the other terms. But I'm stuck for another one. Peter -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Coyle Sent: Tuesday, December 13, 2011 12:23 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF Quoting Simon Spero s...@unc.edu: On Tue, Dec 13, 2011 at 8:58 AM, Richard Wallis richard.wal...@talis.comwrote: However, I think you are thinking in the right direction - I am resigning myself to just using the word 'description'. Q: In your definition, can *descriptions *be put* * into 1:1 correspondence with records (where a record is a atomic asserted set of propositions about a resource)? Yes, I realize that you were asking Richard, but I'm a bit forward, as we know. I do NOT see a description as atomic in the sense that a record is atomic. A record has rigid walls, a description has permeable ones. A description always has the POTENTIAL to have a bit of unexpected data added; a record cuts off that possibility. That said, I am curious about the permeability of the edges of a named graph. I don't know their degree of rigidity in terms of properties allowed. kc Simon -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Simon, You wrote: Q: In your definition, can *descriptions *be put* * into 1:1 correspondence with records (where a record is a atomic asserted set of propositions about a resource)? I do not believe so, especially when referencing back to where we started - the Marc Record. A Marc record more often than not, contains propositions about many things: * The book itself (lets assume that's what the record is about) - isbn, number of pages, cost, format, shelf location * The author - name, birth/death date * The publisher - name, location * Publication event - date, publisher, location * Subject(s) In my view this record contains information to populate 5 or more separate descriptions, plus the related links between them. On Tue, Dec 13, 2011 at 3:22 PM, Karen Coyle li...@kcoyle.net wrote: Yes, I realize that you were asking Richard, but I'm a bit forward, as we know. Karen, thanks for diving in ;-) I do NOT see a description as atomic in the sense that a record is atomic. A record has rigid walls, a description has permeable ones. A description always has the POTENTIAL to have a bit of unexpected data added; a record cuts off that possibility. Yes. Take the author thing from above. It may have it's basic, Marc record derived information, enhanced, by merging with external resources, to add an author's website or image. That said, I am curious about the permeability of the edges of a named graph. I don't know their degree of rigidity in terms of properties allowed. Named graphs were supposed to be invariant under the original proposal; there is a lot of mess around the semantics right now. Dan Brickley wrote a very nice example : http://danbri.org/words/2011/11/03/753 . As per the comments on Dan's blog, it is dangerous to jump on named graphs as the solution to perceived problems. If I wanted to load RDF from three separate libraries in to a triple store I would assign them to three named graphs, but then probably query the default global graph giving a merged view. Using named graphs to try to recreate our original source record seems to defeat the [opening up] purpose of moving to Linked Data modeling in the first place. I also think it would add in a layer of complexity without an obvious justifying data consumer use case. ~Richard -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 13 December 2011 22:17, Peter Noerr pno...@museglobal.com wrote: I agree with Karen below that a record seems more bounded and static, whereas a description varies according to need. And that is the distinction I was trying to get at: that the item stored in some database is everything unique about that entity - and is static, until some data actually changes, whereas the description is built at run time for the user and may contain some data from the item record, and some aggregated from other, linked, item records. The records all have long term existence in databases and the like, whereas the description is a view of all that stored data appropriate for the moment. It will only be stored as a processing intermediate result (as a record, since its contents are now fixed), and not long term, since it would be broken up to bits of entity data and stored in a distributed linked fashion (much like, as I understand it, the BL did when reading MARC records and storing them as entity updates.) Yes. However those descriptions have the potential to be as permanent as the records that they were derived from. As in the BL's case where the RDF is stored, published and queried in [Talis] Kasabi.com: http://kasabi.com/dataset/british-national-bibliography-bnb Having said all that, I don't like the term description as it carries a lot of baggage, as do all the other terms. But I'm stuck for another one. Me too. I'm still searching searching for a budget airline term - no baggage! ~Richard. -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
-Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Richard Wallis Sent: Tuesday, December 13, 2011 3:16 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF On 13 December 2011 22:17, Peter Noerr pno...@museglobal.com wrote: I agree with Karen below that a record seems more bounded and static, whereas a description varies according to need. And that is the distinction I was trying to get at: that the item stored in some database is everything unique about that entity - and is static, until some data actually changes, whereas the description is built at run time for the user and may contain some data from the item record, and some aggregated from other, linked, item records. The records all have long term existence in databases and the like, whereas the description is a view of all that stored data appropriate for the moment. It will only be stored as a processing intermediate result (as a record, since its contents are now fixed), and not long term, since it would be broken up to bits of entity data and stored in a distributed linked fashion (much like, as I understand it, the BL did when reading MARC records and storing them as entity updates.) Yes. However those descriptions have the potential to be as permanent as the records that they were derived from. As in the BL's case where the RDF is stored, published and queried in [Talis] Kasabi.com: http://kasabi.com/dataset/british-national-bibliography-bnb I would argue that they are stored permanently as multiple records holding the data about each of the individual entities derived from the original single MARC record. In my mind (for this discussion) anything that is stored is a record. It may be a single agglutinative record such as MARC, or the same data may be split amongst records for the work, the author, the subjects, the physical instance, the referenced people, etc. But the data for each of those is stored in a record unique to that entity (or in records for other entities linked to that entity), so the whole data set of attributes get spread around as fields in various records about various entities - and the links between them, let us not forget the very real importance of the links for carrying data. When a user wants to view the information about this title, then a description is assembled from all the stored records and presented to the user. It is, almost by definition (as I am viewing this), an ephemeral view (a virtual record - one which is not stored complete anywhere) for this user. If the user stores this record in a store using the same mechanisms and data model, then the constituent data values will be dispersed to their entity records again. (If the user wants to process the record, then it may well be stored as a whole, since it contains all the information needed for whatever the current task is, and the processed record may be discarded or stored permanently again in a linked data net as data values in various entity records within that model. Or it may be stored whole in an old fashioned record oriented database.) Having said all that, I don't like the term description as it carries a lot of baggage, as do all the other terms. But I'm stuck for another one. Me too. I'm still searching searching for a budget airline term - no baggage! How about something based on South West - where bags fly free! Though I can't make any sort of acronym starting with SW! ~Richard. -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
The other issue that the 'modelling' brings (IMO) is that the model influences use - or better the other way round, the intended use and/or audience should influence the model. This raises questions for me about the value of a 'neutral' model - which is what I perceive libraries as aiming for - treating users as a homogenous mass with needs that will be met by a single approach. Obviously there are resource implications to developing multiple models for different uses/audiences, and once again I'd argue that an advantage of the linked data approach is that it allows for the effort to be distributed amongst the relevant communities. To be provocative - has the time come for us to abandon the idea that 'libraries' act as one where cataloguing is concerned, and our metadata serves the same purpose in all contexts? (I can't decide if I'm serious about this or not!) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 11 Dec 2011, at 23:47, Karen Coyle wrote: Quoting Richard Wallis richard.wal...@talis.com: You get the impression that the BL chose a subset of their current bibliographic data to expose as LD - it was kind of the other way around. Having modeled the 'things' in the British National Bibliography domain (plus those in related domain vocabularis such as VIAF, LCSH, Geonames, Bio, etc.), they then looked at the information held in their [Marc] bib records to identify what could be extracted to populate it. Richard, I've been thinking of something along these lines myself, especially as I see the number of translating X to RDF projects go on. I begin to wonder what there is in library data that is *unique*, and my conclusion is: not much. Books, people, places, topics: they all exist independently of libraries, and libraries cannot take the credit for creating any of them. So we should be able to say quite a bit about the resources in libraries using shared data points -- and by that I mean, data points that are also used by others. So once you decide on a model (as BL did), then it is a matter of looking *outward* for the data to re-use. I maintain, however, as per my LITA Forum talk [1] that the subject headings (without talking about quality thereof) and classification designations that libraries provide are an added value, and we should do more to make them useful for discovery. I know it is only semantics (no pun intended), but we need to stop using the word 'record' when talking about the future description of 'things' or entities that are then linked together. That word has so many built in assumptions, especially in the library world. I'll let you battle that one out with Simon :-), but I am often at a loss for a better term to describe the unit of metadata that libraries may create in the future to describe their resources. Suggestions highly welcome. kc [1] http://kcoyle.net/presentations/lita2011.html -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 11 Dec 2011, at 23:30, Richard Wallis wrote: There is no document I am aware of, but I can point you at the blog post by Tim Hodson [ http://consulting.talis.com/2011/07/british-library-data-model-overview/] who helped the BL get to grips with and start thinking Linked Data. Another by the BL's Neil Wilson [ http://consulting.talis.com/2011/10/establishing-the-connection/] filling in the background around his recent presentations about their work. Neil Wilson at the BL has indicated a few times that in principle the BL has no problem sharing the software they used to extract the relevant data from the MARC records, but that there are licensing issues around the s/w due to the use of a proprietary compiler (sorry, I don't have any more details so I can't explain any more than this). I'm not sure whether this extends to sharing the source that would tell us what exactly was happening, but I think this would be worth more discussion with Neil - I'll try to pursue it with him when I get a chance Owen
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 11 December 2011 23:47, Karen Coyle li...@kcoyle.net wrote: Quoting Richard Wallis richard.wal...@talis.com: You get the impression that the BL chose a subset of their current bibliographic data to expose as LD - it was kind of the other way around. Having modeled the 'things' in the British National Bibliography domain (plus those in related domain vocabularis such as VIAF, LCSH, Geonames, Bio, etc.), they then looked at the information held in their [Marc] bib records to identify what could be extracted to populate it. Richard, I've been thinking of something along these lines myself, especially as I see the number of translating X to RDF projects go on. I begin to wonder what there is in library data that is *unique*, and my conclusion is: not much. Books, people, places, topics: they all exist independently of libraries, and libraries cannot take the credit for creating any of them. So we should be able to say quite a bit about the resources in libraries using shared data points -- and by that I mean, data points that are also used by others. So once you decide on a model (as BL did), then it is a matter of looking *outward* for the data to re-use. Yes! I maintain, however, as per my LITA Forum talk [1] that the subject headings (without talking about quality thereof) and classification designations that libraries provide are an added value, and we should do more to make them useful for discovery. The wider world is always looking for good ways to categorise things. The library community should make it easy for others to utilise their rich heritage of such things. LCSH is an obvious candidate, so is VIAF amongst others. The easier we make it, the more uptake there will be and the more inbound links in to library resources we will get. By easier, I am suggesting that efforts to map these library concepts (where they fit) to their wider world equivalents found in places like Dbpeadia, New York Times, and Geonames, will greatly enhance the use and visibility of library resources. I know it is only semantics (no pun intended), but we need to stop using the word 'record' when talking about the future description of 'things' or entities that are then linked together. That word has so many built in assumptions, especially in the library world. I'll let you battle that one out with Simon :-), but I am often at a loss for a better term to describe the unit of metadata that libraries may create in the future to describe their resources. Suggestions highly welcome. Your are not the only one who is looking for a better term for what is being created - maybe we should hold a competition to come up with one. -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Richard Wallis richard.wal...@talis.com wrote: Your are not the only one who is looking for a better term for what is being created - maybe we should hold a competition to come up with one. A named graph gets thrown around a lot, and even though this is technically correct, it's neither nice nor sexy. In my past a bucket was much used, as you can easily thrown things in or take it out (as opposed to the more terminal record being set), however people have a problem with the conceptual size of said bucket, which more or less summarizes why this term is so hard to pin down. I have, however, seen some revert the old RDBMS world of rows, as they talk about properties on the same line, just thinking the line to be more flexible than what it used to be, but we'll see if it sticks around. Personally I think the problem is that people *like* the idea of a closed little silo that is perfectly contained, no matter if it is technically true or not, and therefore futile. This is also why, I think, it's been so hard to explain to more traditional developers the amazing advantages you get through true semantic modelling; people find it hard to let go of a pattern that has helped them so in the past. Breaking the meta data out of the wonderful constraints of a MARC record? FRBR/RDA will never fly, at least not until they all realize that the constraints are real and that they truly and utterly constrain not just the meta data but the future field of librarying ... :) Regards, Alex
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 12 December 2011 11:16, Alexander Johannesen alexander.johanne...@gmail.com wrote: Richard Wallis richard.wal...@talis.com wrote: Your are not the only one who is looking for a better term for what is being created - maybe we should hold a competition to come up with one. A named graph gets thrown around a lot, and even though this is technically correct, it's neither nice nor sexy. It also carries lots of baggage from the Linked Data/Triple store communities that would get in the way. In my past a bucket was much used, as you can easily thrown things in or take it out (as opposed to the more terminal record being set), however people have a problem with the conceptual size of said bucket, which more or less summarizes why this term is so hard to pin down. Yes, most would assume that a bucket would be the place to put their [think of a better word than] records. I have, however, seen some revert the old RDBMS world of rows, as they talk about properties on the same line, just thinking the line to be more flexible than what it used to be, but we'll see if it sticks around. Collection of triples? Personally I think the problem is that people *like* the idea of a closed little silo that is perfectly contained, no matter if it is technically true or not, and therefore futile. This is also why, I think, it's been so hard to explain to more traditional developers the amazing advantages you get through true semantic modelling; people find it hard to let go of a pattern that has helped them so in the past. A classic example of only being able to describe/understand the future in the terms of your past experience. Breaking the meta data out of the wonderful constraints of a MARC record? FRBR/RDA will never fly, at least not until they all realize that the constraints are real and that they truly and utterly constrain not just the meta data but the future field of librarying ... :) :-) ~Richard. -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Richard Wallis richard.wal...@talis.com wrote: Collection of triples? Yes, no baggage there ... :) Some of us are doing this completely without a single triplet, so I'm not sure it is accurate or even politically correct. *hehe* A classic example of only being able to describe/understand the future in the terms of your past experience. Yes, exactly. Although, having said that, I'm excited that the library world is finally taking the semantic challenge seriously. It's taken quite a number of years, but slowly there's a few drips and draps happening. Here's to hoping that there's a fluse somewhere about to open fully, and maybe the RDA vehicle have proper wheels? (Didn't the last time I checked, but that's admittedly a couple of years back. I hear they at least got new suspension?) Regards, Alex
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
-Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Coyle Sent: Sunday, December 11, 2011 3:47 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF Quoting Richard Wallis richard.wal...@talis.com: You get the impression that the BL chose a subset of their current bibliographic data to expose as LD - it was kind of the other way around. Having modeled the 'things' in the British National Bibliography domain (plus those in related domain vocabularis such as VIAF, LCSH, Geonames, Bio, etc.), they then looked at the information held in their [Marc] bib records to identify what could be extracted to populate it. Richard, I've been thinking of something along these lines myself, especially as I see the number of translating X to RDF projects go on. I begin to wonder what there is in library data that is *unique*, and my conclusion is: not much. Books, people, places, topics: they all exist independently of libraries, and libraries cannot take the credit for creating any of them. So we should be able to say quite a bit about the resources in libraries using shared data points -- and by that I mean, data points that are also used by others. So once you decide on a model (as BL did), then it is a matter of looking *outward* for the data to re-use. Trying to synthesize what Karen, Richard and Simon have bombarded us with here, leads me to conclude that linking to existing (or to be created) external data (ontologies and representations) is a matter of: being sure what you’re the system's current user's context is, and being able to modify the external data brought into the users virtual EMU(see below *** before reading further). I think Simon is right that records will increasingly become virtual in that they are composed as needed by this user for this purpose at this time. We already see this in practice in many uses from adding cover art to book MARC records to just adding summary information to a management level report. Being able to link from a book record to foaf:person and a bib:person records and extract data elements from each as they are needed right now should not be too difficult. As well as a knowledge of the current need, it requires a semantically based mapping of the different elements of those! people representations. The neat part is that the total representation for that person may be expressed through both foaf: and bib: facets from a single EMU which contains all things known about that person, and so our two requests for linked data may, in fact should, be mining the same resource, which will translate the data to the format we ask for each time, and then we will combine those representations back to a collapsed single data set. I think Simon (maybe Richard, maybe all of you) was working towards a single unique EMU for the entity which holds all unique information about it for a number of different uses/scenarios/facets/formats. Of course deciding on what is unique and what is obtained from some more granular breakdown is another issue. (Some experience with this onion skin modeling lies deep in my past, and may need dredging up.) It is also important, IMHO, to think about the repository from of entity data (the EMU) and the transmission form (the data sent to a requesting system when it asks for foaf:person data). They are different and have different requirements. If you are going to allow all these entity data elements to be viewed through a format filter then we have a mixed model, but basically a whole-part between the EMU and the transmission form. (e.g. the full data set contains the person's current address, but the transmitted response sends only the city). Argue amongst yourselves about whether an address is a separate entity and is linked to or not - it makes a simple example to consider it as part of the EMU. All of this requires that we think of the web of data as being composed not of static entities with a description which is fixed at any snapshot in time, but being dynamic in that what two users see of the same entity maybe different at exactly the same instant. So not only a descriptive model structure, but also a set of semantic mappings, a context resolution transformation, and the system to implement it each time a link to related data is followed. I maintain, however, as per my LITA Forum talk [1] that the subject headings (without talking about quality thereof) and classification designations that libraries provide are an added value, and we should do more to make them useful for discovery. I know it is only semantics (no pun intended), but we need to stop using the word 'record' when talking about the future description of 'things' or entities that are then linked together. That word has so many built in assumptions, especially in the library world.
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 10 December 2011 13:14, Karen Coyle li...@kcoyle.net wrote: I don't believe that anyone is saying that we have a goal of having a re-serialization of ISO 2709 in RDF so that we can begin to use that as our data format. We *do* have millions of records in 2709 with cataloging based on AACR or ISBD or other rules. The move to any future format will have to include some kind of transformation of that data. The result will be something ugly, at least at first: AACR in RDF is not going to be good linked data. I agree with your sentiment here but, from what you imply at http://futurelib.pbworks.com/w/page/29114548/MARC%20elements, transformation in to something that would be recognisable by the originators of the source Marc will be difficult - and yes ugly. The refreshing thing about the work done by the BL is that they stepped away from the 'record', modeled the things that make up the BnB domain. Then they implemented processes to extract rich data from the source Marc, enrich it with external links, and load it to an RDF representation of the model. On the way, embedded in the extraction/transformation/enrichment processes there was much ugly data, but that was not exposed beyond the process. An approach I applaud, unlike muddying the waters by attempting to publish vocabularies for every Marc tag you can think of. I believe that you and I share a concern: that current library data is based on such a different model than that of the Semantic Web that by looking at our past data we will fail to understand or take advantage of linked data as it should be. Concern shared. I would however lower my sights slightly by setting the current objective to be 'Publishing bibliographic information as Linked Data to become a valuable and useful part of a Web of Data'. Using the Semantic Web as a goal introduces even more vagueness and baggage. I firmly believe that establishing a linked web of data will eventually underpin a Semantic Web, but there is still a few steps to go before we get anywhere near that. Unfortunately, the library cataloging world has no proposal for linked data cataloging. I'm not sure where we could begin. This is not surprising and I believe, at this stage, it is not a problem. Lets eat the elephant one bite at a time - I envisage a lengthy interim phase where publishing linked bibliographic data derived from traditional Marc records (using processes championed by a community such as CODE4LIB), is the norm. Cataloging processes and systems that use a Linked Data model at the core should then emerge, to satisfy a then established need. ~Richard -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Richard Wallis richard.wal...@talis.com: I agree with your sentiment here but, from what you imply at http://futurelib.pbworks.com/w/page/29114548/MARC%20elements, transformation in to something that would be recognisable by the originators of the source Marc will be difficult - and yes ugly. The refreshing thing about the work done by the BL is that they stepped away from the 'record', modeled the things that make up the BnB domain. Then they implemented processes to extract rich data from the source Marc, enrich it with external links, and load it to an RDF representation of the model. Richard, this is an interesting statement about the BL data. Are you saying that they chose a subset of their current bibliographic data to expose as LD? (I haven't found anything yet that describes the process used, so if there is a document I missed, please send link!) This almost sounds like the FRBR process, BTW - modeling the domain, which is also step one of the Singapore Framework/Dublin Core Application Profile process, then selecting data elements for the domain. [1] FRBR, unfortunately, has perceived problems as model (which I am attempting to gather up here [2] but may move to the LLD community wiki space to give it more visibility). The work that I'm doing is not based on the assumption that all of MARC will be carried forward. The reason I began my work is that I don't think we know what is in the MARC record -- there is similar data scattered all over, some data that changes meaning as indicators are applied, etc. There is no implication that a future record would have all of those data elements, but at least we should know what data elements there are in our data. On a more practical note, before we can link we need our data in coherent semantic chunks, not broken up into tags, subfields, etc. Concern shared. I would however lower my sights slightly by setting the current objective to be 'Publishing bibliographic information as Linked Data to become a valuable and useful part of a Web of Data'. Using the Semantic Web as a goal introduces even more vagueness and baggage. I firmly believe that establishing a linked web of data will eventually underpin a Semantic Web, but there is still a few steps to go before we get anywhere near that. My concern is the creation of LD silos. BL data uses some known namespaces (BIBO, FOAF, BIO), which in fact is a way to join the web of data that many others are participating in, because your foaf:Person can interact with anyone else's foaf:Person. But there are a great number of efforts that are modeling current records (FRBRer, ISBD, MODS, RDA) and are entirely silo'd - there is nothing that would connect the data to anyone else's data (and the ones mentioned would not even connect to each other). So I don't know what you mean by part of a Web of data but to me using non-silo'd properties is enough to meet that criterion. Another possibility is to create links from your properties to properties outside of your silo, e.g. from RDA:Person to foaf:Person, for sharing and discoverability. I'm more concerned than you are about the issue of cataloging rules. A huge effort has gone into RDA and will now go into the new bibliographic framework. RDA will soon have occupied a decade of scarce library community effort, and the new framework will be based on it, just as RDA is based on FRBR. We've been going in this direction for over 20 years. Meanwhile, look at how much has changed in the world around us. We're moving much more slowly than the world we need to be working within. kc [1] http://dublincore.org/documents/singapore-framework/ [2] http://futurelib.pbworks.com/w/page/48221836/FRBR%20Models%20Discussion Unfortunately, the library cataloging world has no proposal for linked data cataloging. I'm not sure where we could begin. This is not surprising and I believe, at this stage, it is not a problem. Lets eat the elephant one bite at a time - I envisage a lengthy interim phase where publishing linked bibliographic data derived from traditional Marc records (using processes championed by a community such as CODE4LIB), is the norm. Cataloging processes and systems that use a Linked Data model at the core should then emerge, to satisfy a then established need. ~Richard -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Simon Spero s...@unc.edu: On Thu, Dec 8, 2011 at 12:16 PM, Richard Wallis richard.wal...@talis.comwrote: *A record is a silo within a silo* * * A record within a catalogue duplicates the publisher/author/subject/etc.information stored in adjacent records describing items by the same author/publisher/etc. This community spends much of it's effort on the best ways to index and represent this duplication to make records accessible. Ideally an author, for instance, should be described [preferably only once] and then related to all the items they produced I would argue that this analysis of the nature of what it is to be a record is incomplete, and that a more nuanced analysis sheds light on some of the theoretical and practical problems that came up during the BL Linked Data meeting. From a logical point of view, a bibliographic record can seen as a theory - that is to say a consistent set of statements. There may be records describing the same thing, but the theories they represent need not be consistent with the statements in the first collection. The record is the context in which these statements are made. I think there is a big difference between the database view (store each unique thing only once and re-use it), the creation view, and what you do with data in applications. Records may be temporary constructs responding to a particular application need or user query. In terms of library data, a cataloger will appear to be creating a complete description (however that is defined); that description will look logically like a record, and it will need to look like that so that the cataloger can decide when it is complete. In response to queries, the ability to produce different records from the same data has some interesting possibilities because it allows for different views to be created based on the nature of the query. A geographic view would show resources on a map; an author view would show resources related to people; a topical view could be a topic map. At the individual resource level, what is included in the resource display (record) could be different for each of those views. kc An example of where the removal of context leads to problems can be seen by considering the case of a Document to which FAST headings are assigned by two different catalogers, each of whom has a different opinion as to the primary subject of the Work. Each facet is a separate statement within the each theory; each theory may represent a coherent view of the subject, yet the direct combination of the two theories may entail statements that neither indexer believes true. The are also performance benefits that arise from admitting records into one's ontology; a great deal of metalogical information, especially that for provenance, is necessarily identical for all statements made within the same theory; all the statements share the same utterer, and the statements were made at the same time. Instead of repeating this metalogical information for every single statement, provenance information can be maintained and reasoned over just once. Simon -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On Sun, Dec 11, 2011 at 10:33 AM, Karen Coyle li...@kcoyle.net wrote: Quoting Simon Spero s...@unc.edu: From a logical point of view, a bibliographic record can seen as a theory -that is to say a consistent set of statements. There may be records describing the same thing, but the theories they represent need not be consistent with the statements in the first collection. The record is the context in which these statements are made. I think there is a big difference between the database view (store each unique thing only once and re-use it), the creation view, and what you do with data in applications. Records may be temporary constructs responding to a particular application need or user query. In terms of library data, a cataloger will appear to be creating a complete description (however that is defined); that description will look logically like a record, and it will need to look like that so that the cataloger can decide when it is complete. In response to queries, the ability to produce different records from the same data has some interesting possibilities because it allows for different views to be created based on the nature of the query. A geographic view would show resources on a map; an author view would show resources related to people; a topical view could be a topic map. At the individual resource level, what is included in the resource display (record) could be different for each of those views. I think I may not have explained myself clearly, as well as making an overly obscure allusion to Quine's From A Logical Point Of Viewhttp://www.worldcat.org/title/from-a-logical-point-of-view-9-logico-philosophical-essays/oclc/1658745/editions?sd=ascse=yrreferer=diqt=facet_ln%3AeditionsView=truefq=ln%3Aeng . The point I was trying to make is not related to any kind of display- it is about how the meanings of the statements derived from a record are only required to be self-consistent, and that it is possible for there to be inconsistencies between two correct descriptions of the same resource. The reason for using FAST headings as an example is that, because they are post-coordinate, and since there the subject of the work may not be unique, as Patrick Wilson shows in Two kinds of powerhttp://books.google.com/books?id=DePy_aazKI4Clpg=PA20dq=editions%3AISBN0520035151pg=PA69#v=onepageqf=false(see. Chapter V in particular). There needs to be information linking together all the assertions made as a single unit. I would claim that the entity to which all these statements relate corresponds at least in part to the concept of the MARC record as speech act. Simon
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 12/11/2011 08:52 PM, Simon Spero wrote: The point I was trying to make is not related to any kind of display- it is about how the meanings of the statements derived from a record are only The reality that library catalog records try to record is the physical book, and in particular its title page. When MARC was invented, it was not realistic to take and store a digital photo of the title page, but today this is entirely realistic. Unlike the book cover, there is most often no copyrighted elements on the title page, so there would be no legal problems. Is photography still absent from library cataloging? I have seen old card catalogs digitized with photos of each card, but I have not yet seen a catalog with photos of title pages. (Unless you count digitization projects like Google Books.) -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On Sun, Dec 11, 2011 at 3:25 PM, Lars Aronsson l...@aronsson.se wrote: On 12/11/2011 08:52 PM, Simon Spero wrote: The point I was trying to make is not related to any kind of display- it is about how the meanings of the statements derived from a record are only The reality that library catalog records try to record is the physical book, and in particular its title page. When MARC was invented, it was not realistic to take and store a digital photo of the title page,but today this is entirely realistic. Unlike the book cover, there is most often no copyrighted elements on the title page, so there would be no legal problems. Is photography still absent from library cataloging? I have seen old card catalogs digitized with photos of each card, but I have not yet seen a catalog with photos of title pages. (Unless you count digitization projects like Google Books.) [ many catalogs have cover art - e.g. http://search.lib.unc.edu/search?R=UNCb4450200 . On the recording of title/verso, see e.g. http://onlinelibrary.wiley.com/doi/10.1002/asi.20551/abstract Under US law the use of thumbnailed cover art for identification purposes is generally considered to be fair use under the rule of *Aribahttp://en.wikipedia.org/wiki/Kelly_v._Arriba_Soft_Corporation , * Original Subject cataloging is not an act of transcription ] * * These issues are orthogonal to the point I'm trying to make, which is that records are collections of related assertions, and that the interrelationship between these assertions is a necessary part of their meaning. Simon
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Simon Spero s...@unc.edu: These issues are orthogonal to the point I'm trying to make, which is that records are collections of related assertions, and that the interrelationship between these assertions is a necessary part of their meaning. Simon Simon, I agree that there are *some* assertions that must be part of the same graph to be meaningful - with the FAST headings being a good example. Other assertions do not need that: to have separate statements that say that the title of book XX8369 (which we will presume for now to be a unique identifier for the manifestation) is My book and the place of publication of book XX8369 is London doesn't seem to me to need any context beyond the book XX8369. So in that case, don't the semantically dependent statements get brought together into either blank node graphs or named graphs, and the others hang together based on the identifier for the thing being described? And if someone wants to select a particular set of statements into a collection, will a named graph do? kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Karen, On 11 December 2011 15:18, Karen Coyle li...@kcoyle.net wrote: Quoting Richard Wallis richard.wal...@talis.com: I agree with your sentiment here but, from what you imply at http://futurelib.pbworks.com/**w/page/29114548/MARC%**20elementshttp://futurelib.pbworks.com/w/page/29114548/MARC%20elements , transformation in to something that would be recognisable by the originators of the source Marc will be difficult - and yes ugly. The refreshing thing about the work done by the BL is that they stepped away from the 'record', modeled the things that make up the BnB domain. Then they implemented processes to extract rich data from the source Marc, enrich it with external links, and load it to an RDF representation of the model. Richard, this is an interesting statement about the BL data. Are you saying that they chose a subset of their current bibliographic data to expose as LD? (I haven't found anything yet that describes the process used, so if there is a document I missed, please send link!) There is no document I am aware of, but I can point you at the blog post by Tim Hodson [ http://consulting.talis.com/2011/07/british-library-data-model-overview/] who helped the BL get to grips with and start thinking Linked Data. Another by the BL's Neil Wilson [ http://consulting.talis.com/2011/10/establishing-the-connection/] filling in the background around his recent presentations about their work. You get the impression that the BL chose a subset of their current bibliographic data to expose as LD - it was kind of the other way around. Having modeled the 'things' in the British National Bibliography domain (plus those in related domain vocabularis such as VIAF, LCSH, Geonames, Bio, etc.), they then looked at the information held in their [Marc] bib records to identify what could be extracted to populate it. This almost sounds like the FRBR process, BTW - modeling the domain, which is also step one of the Singapore Framework/Dublin Core Application Profile process, then selecting data elements for the domain. [1] FRBR, unfortunately, has perceived problems as model (which I am attempting to gather up here [2] but may move to the LLD community wiki space to give it more visibility). The BL will tell you that their model is designed to add to the conversation around how to progress the modelling bibliographic information as Linked Data. There is still a way to go. They are currently looking at how to model multi-part works in the current model and hope to enhance it to bring in other concepts such as FRBR. The work that I'm doing is not based on the assumption that all of MARC will be carried forward. The reason I began my work is that I don't think we know what is in the MARC record -- there is similar data scattered all over, some data that changes meaning as indicators are applied, etc. There is no implication that a future record would have all of those data elements, ... I know it is only semantics (no pun intended), but we need to stop using the word 'record' when talking about the future description of 'things' or entities that are then linked together. That word has so many built in assumptions, especially in the library world. Concern shared. I would however lower my sights slightly by setting the current objective to be 'Publishing bibliographic information as Linked Data to become a valuable and useful part of a Web of Data'. Using the Semantic Web as a goal introduces even more vagueness and baggage. I firmly believe that establishing a linked web of data will eventually underpin a Semantic Web, but there is still a few steps to go before we get anywhere near that. My concern is the creation of LD silos. BL data uses some known namespaces (BIBO, FOAF, BIO), which in fact is a way to join the web of data that many others are participating in, because your foaf:Person can interact with anyone else's foaf:Person. But there are a great number of efforts that are modeling current records (FRBRer, ISBD, MODS, RDA) and are entirely silo'd - there is nothing that would connect the data to anyone else's data (and the ones mentioned would not even connect to each other). So I don't know what you mean by part of a Web of data but to me using non-silo'd properties is enough to meet that criterion. Another possibility is to create links from your properties to properties outside of your silo, e.g. from RDA:Person to foaf:Person, for sharing and discoverability. There a couple of ways that your domain can link in to the wider web of data. Firstly, as you identify, by sharing vocabularies. There is a small example in the middle of the BL model, where a Resource is both a dct:BiblographicResource and also (when appropriate) a bibo:Book. In Linked Data there is nothing wrong in mixing ontologies within one domain. If the thing you are modelling is identified as being a foaf:person, there is no reason why it can not also be defined as
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Richard Wallis richard.wal...@talis.com: You get the impression that the BL chose a subset of their current bibliographic data to expose as LD - it was kind of the other way around. Having modeled the 'things' in the British National Bibliography domain (plus those in related domain vocabularis such as VIAF, LCSH, Geonames, Bio, etc.), they then looked at the information held in their [Marc] bib records to identify what could be extracted to populate it. Richard, I've been thinking of something along these lines myself, especially as I see the number of translating X to RDF projects go on. I begin to wonder what there is in library data that is *unique*, and my conclusion is: not much. Books, people, places, topics: they all exist independently of libraries, and libraries cannot take the credit for creating any of them. So we should be able to say quite a bit about the resources in libraries using shared data points -- and by that I mean, data points that are also used by others. So once you decide on a model (as BL did), then it is a matter of looking *outward* for the data to re-use. I maintain, however, as per my LITA Forum talk [1] that the subject headings (without talking about quality thereof) and classification designations that libraries provide are an added value, and we should do more to make them useful for discovery. I know it is only semantics (no pun intended), but we need to stop using the word 'record' when talking about the future description of 'things' or entities that are then linked together. That word has so many built in assumptions, especially in the library world. I'll let you battle that one out with Simon :-), but I am often at a loss for a better term to describe the unit of metadata that libraries may create in the future to describe their resources. Suggestions highly welcome. kc [1] http://kcoyle.net/presentations/lita2011.html -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Richard Wallis richard.wal...@talis.com: Why bother? Transforming Marc in to RDF is an interesting and challenging exercise, but there is little point in doing it without having some potential benefits in mind beyond the it would be great to have our stuff in a new format Richard, perhaps we have been a bit sloppy with our language, and I take some responsibility for that as the initiator of this thread. I don't believe that anyone is saying that we have a goal of having a re-serialization of ISO 2709 in RDF so that we can begin to use that as our data format. We *do* have millions of records in 2709 with cataloging based on AACR or ISBD or other rules. The move to any future format will have to include some kind of transformation of that data. The result will be something ugly, at least at first: AACR in RDF is not going to be good linked data. (The slide that I pointed to earlier from a talk at SWIB11 shows a glass of water and a stem glass of wine -- it refers to MARC data in RDF and asks: if you pour water into a wine glass, does it become wine? Obviously, it does not.) However, all of the library data that we have today to experiment with as linked data is derived from MARC record data. So my initial question was intended to gather a bunch of different solutions as a way to seeing the different views on this. I have started (lord knows if I'll ever have time to finish) an analysis of the data in MARC records http://futurelib.pbworks.com/w/page/29114548/MARC%20elements with an attempt to separate the semantics from the format. That isn't in itself an end goal, but a means to an end -- a way to understand what information we may wish to carry forward into a new metadata environment. The MARC format hides a lot of the meaning by coding it in indicators and spreading it across fields designed for display, etc. I think that an analysis of this type could help us move further from MARC without losing the data we already have created. I believe that you and I share a concern: that current library data is based on such a different model than that of the Semantic Web that by looking at our past data we will fail to understand or take advantage of linked data as it should be. This is my concern with FRBR and RDA: they are based on that previous model, and cannot be directly expressed as linked data, or at least not as good linked data. Our problem is not so much with MARC, which is a reflection of the catalog record, but with our entire view of the catalog entry as the end product of our work. Unfortunately, the library cataloging world has no proposal for linked data cataloging. I'm not sure where we could begin. kc RDF is a means to an end We shouldn't loose sight of the RDF TLA, Resource Description Framework - it is a framework for describing [our] resources. It is the, de facto, standard for publishing Linked Data. Publishing descriptions of our resources as Linked Data does fall in to the potential benefits arena - reuse, mixing, merging, lowering barriers to use of data across, and from outside of, the library community. If it waddles and quacks, it is probably still a duck Transforming a Marc record to XMLMarc just created the same record in in a different wrapper. Apart from the technical benefit (of being able to use generic tools to work with it), it did not move us much further forward towards opening up our data to wider use. Transforming Marc, of any flavor, into an RDF representation of a record still leaves us with a record per item - a digital card catalogue equivalent. A record is a silo within a silo A record within a catalogue duplicates the publisher/author/subject/etc. information stored in adjacent records describing items by the same author/publisher/etc. This community spends much of it's effort on the best ways to index and represent this duplication to make records accessible. Ideally an author, for instance, should be described [preferably only once] and then related to all the items they produced Linked Data should be the goal At the event mentioned by Mike, Linked Data and Libraries[1], the British Library launched their initial data model for the British National Bibliography[2]. One of the key concepts of Linked Data is to represent data as a set of interlinked things. These things are referred to as objects of interest, they are things about which we can make statements. In this model you get statements about things (eg. books, authors, publishers, publishing events, subjects, places, etc.) and the links between them - not a record per item. Storing Marc in an RDF triple, or link to it? The question I would ask is, which consumer of your data would this be useful for? Secondly, whatever your answer, it does not make sense to say that this item, or author, or publisher 'thing' was derived from a particular Marc record - you could perhaps at data set, or graph, level
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On Thu, Dec 8, 2011 at 12:16 PM, Richard Wallis richard.wal...@talis.comwrote: *A record is a silo within a silo* * * A record within a catalogue duplicates the publisher/author/subject/etc.information stored in adjacent records describing items by the same author/publisher/etc. This community spends much of it's effort on the best ways to index and represent this duplication to make records accessible. Ideally an author, for instance, should be described [preferably only once] and then related to all the items they produced I would argue that this analysis of the nature of what it is to be a record is incomplete, and that a more nuanced analysis sheds light on some of the theoretical and practical problems that came up during the BL Linked Data meeting. From a logical point of view, a bibliographic record can seen as a theory - that is to say a consistent set of statements. There may be records describing the same thing, but the theories they represent need not be consistent with the statements in the first collection. The record is the context in which these statements are made. An example of where the removal of context leads to problems can be seen by considering the case of a Document to which FAST headings are assigned by two different catalogers, each of whom has a different opinion as to the primary subject of the Work. Each facet is a separate statement within the each theory; each theory may represent a coherent view of the subject, yet the direct combination of the two theories may entail statements that neither indexer believes true. The are also performance benefits that arise from admitting records into one's ontology; a great deal of metalogical information, especially that for provenance, is necessarily identical for all statements made within the same theory; all the statements share the same utterer, and the statements were made at the same time. Instead of repeating this metalogical information for every single statement, provenance information can be maintained and reasoned over just once. Simon
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 7 December 2011 16:29, Karen Coyle li...@kcoyle.net wrote: (As an aside, there is some concern that the use of FRBR will make linking from library bibliographic data to non-library bibliographic data difficult, if not impossible. Having had some contact with members of the FRBR review group, they seem impervious to that concern.) kc I somehow missed out on this thread and it's predecessor, until a major fail in the British rail system resulted in an unexpected coffee with Owen yesterday - I hope he got home OK.However the benefit of being late to a conversation is that you can see where the points of friction are. So a few thoughts on those: Why bother? Transforming Marc in to RDF is an interesting and challenging exercise, but there is little point in doing it without having some potential benefits in mind beyond the it would be great to have our stuff in a new format RDF is a means to an end We shouldn't loose sight of the RDF TLA, Resource Description Framework - it is a framework for describing [our] resources. It is the, de facto, standard for publishing Linked Data. Publishing descriptions of our resources as Linked Data does fall in to the potential benefits arena - reuse, mixing, merging, lowering barriers to use of data across, and from outside of, the library community. If it waddles and quacks, it is probably still a duck Transforming a Marc record to XMLMarc just created the same record in in a different wrapper. Apart from the technical benefit (of being able to use generic tools to work with it), it did not move us much further forward towards opening up our data to wider use. Transforming Marc, of any flavor, into an RDF representation of a record still leaves us with a record per item - a digital card catalogue equivalent. A record is a silo within a silo A record within a catalogue duplicates the publisher/author/subject/etc. information stored in adjacent records describing items by the same author/publisher/etc. This community spends much of it's effort on the best ways to index and represent this duplication to make records accessible. Ideally an author, for instance, should be described [preferably only once] and then related to all the items they produced Linked Data should be the goal At the event mentioned by Mike, Linked Data and Libraries[1], the British Library launched their initial data model for the British National Bibliography[2]. One of the key concepts of Linked Data is to represent data as a set of interlinked things. These things are referred to as objects of interest, they are things about which we can make statements. In this model you get statements about things (eg. books, authors, publishers, publishing events, subjects, places, etc.) and the links between them - not a record per item. Storing Marc in an RDF triple, or link to it? The question I would ask is, which consumer of your data would this be useful for? Secondly, whatever your answer, it does not make sense to say that this item, or author, or publisher 'thing' was derived from a particular Marc record - you could perhaps at data set, or graph, level (using the provenance vocabulary) define that it was transformed from a particular source, at a time, using a method, by a person/process. Who's Ontology Do we only use library domain ontologies/vocabularies or do we employ dc, foaf, bibo, etc. ? Do we use dc:creator which most of the [non-library] world will understand, or some esoteric [to them] rda properties to describe corporate and many other nuance of authorship? If you want to enable general application developers/data consumers to use your data, you need to apply the well known [if possibly course-grained or lossy] terms. If you want to preserve the rich detail extracted from the source Marc, you need to delve deeper in to bibliographically oriented properties. Can you do both? Yes. Should you do both? Probably. ~Richard. I think I better stop now and contemplate a blog post to further these thoughts. [1] http://consulting.talis.com/resources/presentations-from-linked-data-and-libraries-2011/ [2]http://consulting.talis.com/2011/07/british-library-data-model-overview/ -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 7 Dec 2011, at 00:38, Alexander Johannesen wrote: Hiya, Karen Coyle li...@kcoyle.net wrote: I wonder how easy it will be to manage a metadata scheme that has cherry-picked from existing ones, so something like: dc:title bibo:chapter foaf:depiction Yes, you're right in pointing out this as a problem. And my answer is; it's complicated. My previous rant on this list was about data models*, and dangnabbit if this isn't related as well. What your example is doing is pointing out a new model based on bits of other models. This works fine, for the most part, when the concepts are simple; simple to understand, simple to extend. Often you'll find that what used to be unclear has grown clear over time (as more and more have used FOAF, you'll find some things are more used and better understood, while other parts of it fade into 'we don't really use that anymore') But when things get complicated, it *can* render your model unusable. Mixed data models can be good, but can also lead directly to meta data hell. For example ; dc:title foaf:title Ouch. Although not a biggie, I see this kind of discrepancy all the time, so the argument against mixed models is of course that the power of definition lies with you rather than some third-party that might change their mind (albeit rare) or have similar terms that differ (more often). I personally would say that the library world should define RDA as you need it to be, and worry less about reuse at this stage unless you know for sure that the external models do bibliographic meta data well. I agree this is a risk, and I suspect there is a further risk around simply the feeling of 'ownership' by the community - perhaps it is easier to feel ownership over an entire ontoloy than an 'application profile' of somekind. It maybe that mapping is the solution to this, but if this is really going to work I suspect it needs to be done from the very start - otherwise it is just another crosswalk, and we'll get varying views on how much one thing maps to another (but perhaps that's OK - I'm not looking for perfection) That said, I believe we need absolutely to be aiming for a world in which we work with mixed ontologies - no matter what we do other, relevant, data sources will use FOAF, Bibo etc.. I'm convinced that this gives us the opportunity to stop treating what are very mixed materials in a single way, while still exploiting common properties. For example Musical materials are really not well catered for in MARC, and we know there are real issues with applying FRBR to them - and I see the implementation of RDF/Linked Data as an opportunity to tackle this issue by adopting alternative ontologies where it makes sense, while still assigning common properties (dc:title) where this makes sense. HOWEVER! When we're done talking about ontologies and vocabularies, we need to talk about identifiers, and there I would swing the other way and let reuse govern, because it is when you reuse an identifier you start thinking about what that identifiers means to *both* parties. Or, put differently ; It's remarkably easier to get this right if the identifier is a number, rather than some word. And for that reason I'd say reuse identifiers (subject proxies) as they are easier to get right and bring a lot of benefits, but not ontologies (model proxies) as they can be very difficult to get right and don't necessarily give you what you want. Agreed :)
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Hi Owen - I am doing a paper on FRBR, RDF, and linked data, so this thread is very helpful for me. Can you describe the issue with musical materials in MARC and FRBR's impact on them? TIA, Laura On Wed, Dec 7, 2011 at 3:00 AM, Owen Stephens o...@ostephens.com wrote: That said, I believe we need absolutely to be aiming for a world in which we work with mixed ontologies - no matter what we do other, relevant, data sources will use FOAF, Bibo etc.. I'm convinced that this gives us the opportunity to stop treating what are very mixed materials in a single way, while still exploiting common properties. For example Musical materials are really not well catered for in MARC, and we know there are real issues with applying FRBR to them - and I see the implementation of RDF/Linked Data as an opportunity to tackle this issue by adopting alternative ontologies where it makes sense, while still assigning common properties (dc:title) where this makes sense. __ L.B. Johnson Library Tech Program Student City College of San Francisco http://lbjtech.zzl.org CCSF *Guardsman *Archive Blog http://theguardsmandigitalarchive.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Owen Stephens o...@ostephens.com: I agree this is a risk, and I suspect there is a further risk around simply the feeling of 'ownership' by the community - perhaps it is easier to feel ownership over an entire ontoloy than an 'application profile' of somekind. It maybe that mapping is the solution to this, but if this is really going to work I suspect it needs to be done from the very start - otherwise it is just another crosswalk, and we'll get varying views on how much one thing maps to another (but perhaps that's OK - I'm not looking for perfection) I agree with Owen here. One of the advantages of using a mixed vocabulary is that it forces you to think about your own data in relation to that of others, and thus makes it less likely that you will end up in a silo. Just creating your data in RDF is not enough to making linking happen. Look at where LCSH sits on the LD cloud[1] and you see that there are very few links to it. That's not because it isn't in proper RDF, it's because quite frankly no one outside of libraries has much use for library subject headings in their current state. I think that we (whoever we is in this case) should be working hard to create links from RDA elements (which are already defined in RDF)[2] to other vocabularies, like FOAF, DC, BIBO, etc. If it should turn out that links of that nature cannot be made, for example because the content of the data would be significantly different (Tolkien, J. R. R., John Ronald Reuel, 1892-1973 v. J. R. R. Tolkien) then we need to find a way to MAKE our data play well with that of others. The problem that we have, IMNSHO, is not so much our data FORMAT but our DATA itself. If we don't consider linking outside of the library world, we will just create another silo for ourselves; an RDF silo, but still a silo. (As an aside, there is some concern that the use of FRBR will make linking from library bibliographic data to non-library bibliographic data difficult, if not impossible. Having had some contact with members of the FRBR review group, they seem impervious to that concern.) kc [1] http://linkeddata.org [2] http://rdvocab.info That said, I believe we need absolutely to be aiming for a world in which we work with mixed ontologies - no matter what we do other, relevant, data sources will use FOAF, Bibo etc.. I'm convinced that this gives us the opportunity to stop treating what are very mixed materials in a single way, while still exploiting common properties. For example Musical materials are really not well catered for in MARC, and we know there are real issues with applying FRBR to them - and I see the implementation of RDF/Linked Data as an opportunity to tackle this issue by adopting alternative ontologies where it makes sense, while still assigning common properties (dc:title) where this makes sense. HOWEVER! When we're done talking about ontologies and vocabularies, we need to talk about identifiers, and there I would swing the other way and let reuse govern, because it is when you reuse an identifier you start thinking about what that identifiers means to *both* parties. Or, put differently ; It's remarkably easier to get this right if the identifier is a number, rather than some word. And for that reason I'd say reuse identifiers (subject proxies) as they are easier to get right and bring a lot of benefits, but not ontologies (model proxies) as they can be very difficult to get right and don't necessarily give you what you want. Agreed :) -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
[CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Owen Stephens o...@ostephens.com: This is why RDA worries me - because it (seems to?) suggest that we define a schema that stands alone from everything else and that is used by the library community. I'd prefer to see the library community adopting the best of what already exists and then enhancing where the existing ontologies are lacking. I've been ruminating a bit on the ad/dis-advantages of re-use vs. create your own then link to others. In the end, I wonder how easy it will be to manage a metadata scheme that has cherry-picked from existing ones, so something like: dc:title bibo:chapter foaf:depiction but NOT including all properties in those namespaces. It requires any application to have detailed knowledge about the particular selections made. On the other hand, something like: myNS:title -- sameas -- dc:title myNS:chapter -- sameas -- bibo:chapter myNS:depiction -- sameas -- foaf:depiction allows you to easily identify your properties, but at the same time gives you the equivalents to other properties in other namespaces for sharing. It also gives you greater stability. If the FOAF community should (rudely) change the meaning of depiction, you could find yourself using a property that no longer means what it should. Instead, if you have your own namespace you can change your link to foaf (or remove it altogether) to indicate that you now fork from that property. Perhaps what I perceive is that properties persist over time and relationships can be more easily treated as relating to now. kc p.s. I agree with you about RDA, but think that links could be made to remedy that. If we are going to have a (web of) linked data, then re-use of ontologies and IDs is needed. For example in the work I did at the Open University in the UK we ended up only a single property from a specific library ontology (the draft ISBD http://metadataregistry.org/schemaprop/show/id/1957.html has place of publication, production, distribution). I think it is interesting that many of the MARC-RDF mappings so far have adopting many of the same ontologies (although no doubt partly because there is a 'follow the leader' element to this - or at least there was for me when looking at the transformation at the Open University) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 5 Dec 2011, at 18:56, Jonathan Rochkind wrote: On 12/5/2011 1:40 PM, Karen Coyle wrote: This brings up another point that I haven't fully grokked yet: the use of MARC kept library data consistent across the many thousands of libraries that had MARC-based systems. Well, only somewhat consistent, but, yeah. What happens if we move to RDF without a standard? Can we rely on linking to provide interoperability without that rigid consistency of data models? Definitely not. I think this is a real issue. There is no magic to linking or RDF that provides interoperability for free; it's all about the vocabularies/schemata -- whether in MARC or in anything else. (Note different national/regional library communities used different schemata in MARC, which made interoperability infeasible there. Some still do, although gradually people have moved to Marc21 precisely for this reason, even when Marc21 was less powerful than the MARC variant they started with). That is to say, if we just used MARC's own implicit vocabularies, but output them as RDF, sure, we'd still have consistency, although we wouldn't really _gain_ much.On the other hand, if we switch to a new better vocabulary -- we've got to actually switch to a new better vocabulary. If it's just whatever anyone wants to use, we've made it VERY difficult to share data, which is something pretty darn important to us. Of course, the goal of the RDA process (or one of em) was to create a new schema for us to consistently use. That's the library community effort to maintain a common schema that is more powerful and flexible than MARC. If people are using other things instead, apparently that failed, or at least has not yet succeeded. -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet