[CODE4LIB] Bibframe contracts
Dear All, The Library of Congress has issued two solicitations (RFQs) for Bibframe-related development work. We want to be sure to advertise these possibilities to this community. You can read more about them at the below links. The first is for a Bibframe Search and Display tool (to be clear, this is /not/ to create a search and display tool for the entire LC catalog). https://www.fbo.gov/index?s=opportunitymode=formid=11db76388c0caafa72f6bd6ccb3d159ftab=core_cview=0 The second is for a Bibframe Profiles editor (that is, an editor for the Profiles themselves): https://www.fbo.gov/index?s=opportunitymode=formid=927b167c07002045e51cb8c53485fc4etab=core_cview=0 Proposals relating to both are due next week (Aug 6). We want to encourage any interested developer, or developer team, to respond to the RFQs. I frankly do not know what rules may pertain to bidding on US government contracts, but a quick review of the requirements for registering for government contracting suggests that it isn't too arduous: http://www.sba.gov/content/register-government-contracting For example, a DUNS number and EIN can be acquired in a day. I think these could make for ideal side projects for a small team of interested developers from this community. So many of you have the skills and expertise needed to really produce a very interesting software solution to the above solicitations. I can't answer any questions about these contracts in this forum, but you can use the contacts listed at the bottom of the above pages if you have questions. Those inquiries are forwarded to us, we answer them, and then the information is posted publicly so that everyone interested in the opportunity has access to the same information. Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC
Re: [CODE4LIB] Announcement: Two New Vocabularies added to LC's Linked Data Service
For a variety of reasons, no, we do not have a SPARQL endpoint. Yours, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Péter Király Sent: Thursday, June 26, 2014 6:34 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Announcement: Two New Vocabularies added to LC's Linked Data Service Hi Kevin, 2014-06-25 23:00 GMT+02:00 Ford, Kevin k...@loc.gov: The Library of Congress is pleased to make two new vocabularies available as linked data congratulation, it's very useful. I have a question: do you have a SPARQL endpoint as well? Regards, Péter -- Péter Király software developer Europeana - http://europeana.eu eXtensible Catalog - http://eXtensibleCatalog.org
[CODE4LIB] Bibframe survey
Dear All, Please see below a copy and pasted message, which was posted to the Bibframe listerv and also a number of (mostly) cataloging listservs. Although it was developed and sponsored by the Program for Cooperative Cataloging (PCC), we're interested in the broadest possible feedback from the library community. The code4lib community - comprised of developers and other library tech types - is a vital element within the broader community, and one we see as a key stakeholder in this process, and we'd very much like your feedback to the below survey. On June 20, 2014, the Library of Congress announced its desire to collaborate with the Program for Cooperative Cataloging in the endorsement and support of BIBFRAME as the model to help the library community move into the Linked Data environment. PCC and LC strongly encourage the PCC membership and the broader library community to become more knowledgeable and attuned to the development and rollout of BIBFRAME and how it fits within libraries and the larger Linked Data sphere. The PCC Secretariat has created a BIBFRAME survey that aims to assess the current level of understanding of BIBFRAME within the PCC community and the wider information community. The survey also asks for ways in which information and announcements on BIBFRAME can be shared more widely within the communities. The PCC Secretariat encourages all PCC members to take the survey, and requests that PCC members share the survey widely with colleagues in all spheres of library work - vendors, systems, acquisitions, and other areas. You do not need to be a PCC member in order to take the survey! The survey should take approximately 10 minutes or less to complete, and you may remain anonymous if you wish. https://www.surveymonkey.com/s/PCC-BIBFRAME-2014 The survey will close on Monday, July 14, 2014. -- I can vouch that it should take only a little of your valuable time to complete. Cordially, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC
[CODE4LIB] Announcement: Two New Vocabularies added to LC's Linked Data Service
The Library of Congress is pleased to make two new vocabularies available as linked data from LC's Linked Data Service, ID.LOC.GOV: the Library of Congress Medium of Performance Thesaurus for Music (LCMPT) and the American Folklife Society's Ethnographic Thesaurus (AFSET). The LCMPT is a linked data representation of terminology to describe the instruments, voices, etc., used in the performance of musical works. The AFSET is a linked data representation of terms that can be used to improve access to information about folklore, ethnomusicology, ethnology, and related fields. While LCMPT is relatively small, with fewer than 1,000 entries, AFSET includes more than 16,000 concepts. Bulk downloads have been made available from the Downloads page for each dataset. On a related note, a number of bulk downloads - such as those for Children's Subject Headings and Genre Form Headings - have also been updated. ** Please explore them for yourself at LCMPT - http://id.loc.gov/authorities/performanceMediums AFSET - http://id.loc.gov/vocabulary/ethnographicTerms ** Contact Us about ID: As always, your feedback is important and welcomed. Though we are interested in all forms of constructive commentary on all topics related to ID, we're particularly interested in how the data available from ID.LOC.GOV is used. Your contributions directly inform service enhancements. You can send comments or report any problems to us via the ID feedback form or ID listserv (see the web site). Background: The LC Linked Data Service was first made available in May 2009 and offered the Library of Congress Subject Headings (LCSH), the Library's initial entry into the Linked Data environment. In part by assigning each vocabulary and each data value within it a unique resource identifier (URI), the service provides a means for machines to semantically access, use, and harvest authority and vocabulary data that adheres to W3C recommendations, such as Simple Knowledge Organization System (SKOS), and the more detailed vocabulary MADS/RDF. In this way, the LC Linked Data Service also makes government data publicly and freely available in the spirit of the Open Government directive. Although the primary goal of the service is to enable machine access to Library of Congress data, a web interface serves human users searching and browsing the vocabularies. The new datasets join the term and code lists already available through the service: * Library of Congress Subject Headings (LCSH) * Library of Congress Children's Subject Headings * Library of Congress Genre/Form Terms * Library of Congress / NACO Name Authority File * Library of Congress / LCC (select schedules) * Thesaurus of Graphic Materials * Cultural Heritage Organizations * MARC Code List for Relators * MARC Code List for Countries (which reference their equivalent ISO 3166 codes) * MARC Code List for Geographic Areas * MARC Code List for Languages (which have been cross referenced with ISO 639-1, 639-2, and 639-5, where appropriate) * PREMIS vocabularies The above code lists also contain links with appropriate LCSH and LC/NAF headings. LC's Linked Data Service is managed by the Network Development and MARC Standards Office of the Library of Congress. -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC
Re: [CODE4LIB] [CODE4LIB] HEADS UP - Government shutdown will mean *.loc.gov is going offline October 1
All *.loc.gov web sites will be closed, including the two you quoted. The Internet Archive's Way Back Machine is probably your best bet for these types of things: http://web.archive.org/web/*/http://www.loc.gov/marc/ http://web.archive.org/web/*/http://www.loc.gov/standards/sourcelist/index.html Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Becky Yoose Sent: Monday, September 30, 2013 4:32 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] [CODE4LIB] HEADS UP - Government shutdown will mean *.loc.gov is going offline October 1 FYI - this also means that there's a very good chance that the MARC standards site [1] and the Source Codes site [2] will be down as well. I don't know if there are any mirror sites out there for these pages. [1] http://www.loc.gov/marc/ [2] http://www.loc.gov/standards/sourcelist/index.html Thanks, Becky, about to be (forcefully) departed with her standards documentation On Mon, Sep 30, 2013 at 11:39 AM, Jodi Schneider jschnei...@pobox.comwrote: Interesting -- thanks, Birkin -- and tell us what you think when you get it implemented! :) -Jodi On Mon, Sep 30, 2013 at 5:19 PM, Birkin Diana birkin_di...@brown.edu wrote: ...you'd want to create a caching service... One solution for a relevant particular problem (not full-blown linked-data caching): http://en.wikipedia.org/wiki/XML_Catalog excerpt: However, if they are absolute URLs, they only work when your network can reach them. Relying on remote resources makes XML processing susceptible to both planned and unplanned network downtime. We'd heard about this a while ago, but, Jodi, you and David Riordan and Congress have caused a temporary retreat from normal sprint-work here at Brown today to investigate implementing this! :/ The particular problem that would affect us: if your processing tool checks, say, an loc.gov mods namespace url, that processing will fail if the loc.gov url isn't available, unless you've implemented xml catalog, which is a formal way to locally resolve such external references. -b --- Birkin James Diana Programmer, Digital Technologies Brown University Library birkin_di...@brown.edu On Sep 30, 2013, at 7:15 AM, Uldis Bojars capts...@gmail.com wrote: What are best practices for preventing problems in cases like this when an important Linked Data service may go offline? --- originally this was a reply to Jodi which she suggested to post on the list too --- A safe [pessimistic?] approach would be to say we don't trust [reliability of] linked data on the Web as services can and will go down and to cache everything. In that case you'd want to create a caching service that would keep updated copies of all important Linked Data sources and a fall-back strategy for switching to this caching service when needed. Like archive.org for Linked Data. Some semantic web search engines might already have subsets of Linked Data web cached, but not sure how much they cover (e.g., if they have all of LoC data, up-to-date). If one were to create such a service how to best update it, considering you'd be requesting *all* Linked Data URIs from each source? An efficient approach would be to regularly load RDF dumps for every major source if available (e.g., LoC says - here's a full dump of all our RDF data ... and a .torrent too). What do you think? Uldis On 29 September 2013 12:33, Jodi Schneider jschnei...@pobox.com wrote: Any best practices for caching authorities/vocabs to suggest for this thread on the Code4Lib list? Linked Data authorities vocabularies at Library of Congress ( id.loc.gov) are going to be affected by the website shutdown -- because of lack of government funds. -Jodi
Re: [CODE4LIB] Marcive.com hosts are compromised
Righty. I had to view the source, but I saw the injected text. I gave the one contact I know at marcive a call. She saw it too. Yours, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Sam Kome Sent: Friday, August 30, 2013 3:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Marcive.com hosts are compromised Sorry about that - I mistype 'Marcive' all the time. Despite that, it is the site I meant, sans 'h'. It will resolve correctly but I wouldn't advise visiting - take precautions. Google search results also suggest it is compromised and the page sources contain pharma metadata. I emailed and then called the technical contact number. Got a response on the phone, sounded like they were unaware but would look into it. Our Collections folks report not receiving expected reports this month so the problem may be fairly old. SK -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ford, Kevin Sent: Friday, August 30, 2013 12:04 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Marcive.com hosts are compromised http://marcive.com goes to the right place for me. It is the one you mentioned in the subject line of your email. http://marchive.com (note the h) goes to a domain squatter. It is the one you mentioned in the body of your email. Which one is causing you the issue? Cordially, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Sam Kome Sent: Friday, August 30, 2013 2:07 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Marcive.com hosts are compromised Based on the pharmaceutical ads in their page sources and the fact that our Cisco Iron Port has blacklisted them, I have to regretfully report that marchive.com has been compromised. Does anyone know the relevant contact(s) there to notify? Sam Kome | Assistant Director, RD |The Claremont Colleges Library Claremont University Consortium |800 N. Dartmouth Ave |Claremont, CA 91711 Phone (909) 621-8866 |Fax (909) 621-8517 |sam_k...@cuc.claremont.edumailto:%7csam_k...@cuc.claremont.edu
Re: [CODE4LIB] Marcive.com hosts are compromised
http://marcive.com goes to the right place for me. It is the one you mentioned in the subject line of your email. http://marchive.com (note the h) goes to a domain squatter. It is the one you mentioned in the body of your email. Which one is causing you the issue? Cordially, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Sam Kome Sent: Friday, August 30, 2013 2:07 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Marcive.com hosts are compromised Based on the pharmaceutical ads in their page sources and the fact that our Cisco Iron Port has blacklisted them, I have to regretfully report that marchive.com has been compromised. Does anyone know the relevant contact(s) there to notify? Sam Kome | Assistant Director, RD |The Claremont Colleges Library Claremont University Consortium |800 N. Dartmouth Ave |Claremont, CA 91711 Phone (909) 621-8866 |Fax (909) 621-8517 |sam_k...@cuc.claremont.edumailto:%7csam_k...@cuc.claremont.edu
[CODE4LIB] Announcement: Cultural Heritage Organizations Vocabulary Published
The Library of Congress is pleased to make the Cultural Heritage Organizations vocabulary available as linked data from LC's Linked Data Service, ID.LOC.GOV. The Cultural Heritage Organizations vocabulary is a linked data representation of the MARC Organizations code list, which, among other uses, is an essential reference tool for those dealing with MARC records, for systems reporting library holdings, for many interlibrary loan systems, and for those who may be organizing cooperative projects on a regional, national, or international scale. While the Cultural Heritage Organizations vocabulary focuses on US institutions, with over 30,000 defined, it also includes codes for institutions in other countries that have requested them. However, MARC codes are not assigned for institutions for Canada, Germany, or the United Kingdom unless the institution is a branch of a US institution. Overall, the vocabulary contains over 36,000 entries. Bulk downloads of the Cultural Heritage Organizations vocabulary are also available from the downloads page. ** Please explore the Cultural Heritage Organizations yourself at http://id.loc.gov/vocabulary/organizations ** Contact Us about ID: As always, your feedback is important and welcomed. Though we are interested in all forms of constructive commentary on all topics related to ID, we're particularly interested in how the data available from ID.LOC.GOV is used. Your contributions directly inform service enhancements. You can send comments or report any problems to us via the ID feedback form or ID listserv (see the web site). Background: The LC Linked Data Service was first made available in May 2009 and offered the Library of Congress Subject Headings (LCSH), the Library's initial entry into the Linked Data environment. In part by assigning each vocabulary and each data value within it a unique resource identifier (URI), the service provides a means for machines to semantically access, use, and harvest authority and vocabulary data that adheres to W3C recommendations, such as Simple Knowledge Organization System (SKOS), and the more detailed vocabulary MADS/RDF. In this way, the LC Linked Data Service also makes government data publicly and freely available in the spirit of the Open Government directive. Although the primary goal of the service is to enable machine access to Library of Congress data, a web interface serves human users searching and browsing the vocabularies. The new datasets join the term and code lists already available through the service: * Library of Congress Subject Headings (LCSH) * Library of Congress Children's Subject Headings * Library of Congress Genre/Form Terms * Library of Congress / NACO Name Authority File * Library of Congress / LCC (select schedules) * Thesaurus of Graphic Materials * MARC Code List for Relators * MARC Code List for Countries (which reference their equivalent ISO 3166 codes) * MARC Code List for Geographic Areas * MARC Code List for Languages (which have been cross referenced with ISO 639-1, 639-2, and 639-5, where appropriate) * PREMIS vocabularies The above code lists also contain links with appropriate LCSH and LC/NAF headings. LC's Linked Data Service is managed by the Network Development and MARC Standards Office of the Library of Congress. -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC
Re: [CODE4LIB] best way to make MARC files available to anyone
Dear Dana, Thanks for the detail. Based on the few example comparisons I've seen, I very much like your MARC records more. Not only are they richer, they break up the data better. Yours, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Wednesday, June 12, 2013 7:20 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] best way to make MARC files available to anyone Kevin, Eric 7zip worked fine to unzip and records look pretty good since they used 653 and preserved the string from the metadata element with the hypens. However the records do not do subfield d in 100 or 700 fields and thus such content appears in the 245$c. 245$a seems to go missing with some frequency. MarcEdit does not report any errors though. My original intent was just to keep my XSLT skills sharp while I had some free time last August. After creating the stylesheet, I then had no free time until January when I could devote 2 or 3 hours to the post transform editing. Thought I'd just dive in but the pool was much deeper than I had anticipated. Do think libraries will prefer my edited versions although different in non-access points as well. Incidentally, not many additions since my harvest. First record in the Project Gutenberg produced records: =LDR 00721cam a22002293a 4500 =001 27384 =003 PGUSA =008 081202s2008xxu|s|000\|\eng\d =040 \\$aPGUSA$beng =042 \\$adc =050 \4$aPQ =100 1\$aDumas, Alexandre, 1802-1870 =245 10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas =260 \\$bProject Gutenberg,$c2008 =500 \\$aProject Gutenberg =506 \\$aFreely available. =516 \\$aElectronic text =653 \0$aFrance -- History -- Regency, 1715-1723 -- Fiction =653 \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction =830 \0$aProject Gutenberg$v27384 =856 40$uhttp://www.gutenberg.org/etext/27384 =856 42$uhttp://www.gutenberg.org/license$3Rights couldn't readily find the above item but here's an example of my records by the same author. =LDR 01002nam a22002535 4500 =001 PG18997 =006 md =007 cr||n\|||muaua =008 \\s2006utu|o|||eng\d =042 \\$adc =090 \\$aPQ =092 \0$aeBooks =100 1\$aDumas, Alexandre,$d1802-1870. =245 14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten Years Later being the completion of The Three Musketeers And Twenty Years After /$Alexandre Dumas. =260 \\$aSalt Lake City :$bProject Gutenberg Literary Archive Foundation,$c2006. =300 \\$a1 online resource :$bmultiple file formats. =500 \\$aRecords generated from Project Gutenberg RDF data. =540 \\$aApplicable license:$uhttp://www.gutenberg.org/license =650 \0$aAdventure stories. =650 \0$aHistorical fiction. =651 \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction. =655 \0$aElectronic books. =710 2\$aProject Gutenberg. =856 40$uhttp://www.gutenberg.org/etext/18997$zClick to access. thanks for your interest.. regards, dana On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote: Hi Dana, Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC files? See, e.g.: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_ .28automatically_generated.29 Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 11, 2013 9:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] best way to make MARC files available to anyone I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
Hi Dana, Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC files? See, e.g.: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29 Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 11, 2013 9:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] best way to make MARC files available to anyone I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
Doh! I read all the emails in the thread except for Eric's, which asked the same question. Either way, his or mine, nevertheless curious. Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Phetteplace Sent: Tuesday, June 11, 2013 10:57 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] best way to make MARC files available to anyone Dana - perhaps a public Dropbox folder? Or just put the files up on your site somewhere, served with a Content-Disposition: attachment header so they trigger a download when accessed? E.g. here's a StackOverflowhttp://stackoverflow.com/questions/9195304/how-to-use- content-disposition-for-force-a-file-to-download-to-the-hard- drivethread on that. If they must be a recognized MIME type, you could compress them as .zip or .tar.gz files on the server, which would reduce download time either way. I did try clicking the links on your site and they never downloaded, the request just timed out. Not to discredit what you're doing, which is great, but aren't MARC records already available for Project Gutenberg? See their offline catalogshttp://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_ Records_.28offsite.29page. Best, Eric Phetteplace Emerging Technologies Librarian Chesapeake College Wye Mills, MD On Tue, Jun 11, 2013 at 9:24 PM, Dana Pearson dbpearsonm...@gmail.comwrote: I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non- Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] LOC Subject Headings API
Dear Josh, Take a look at Mike's email below, which may have quickly fell down the inbox, helped along by an unhelpful reply. It has the suggest pattern, but to repeat the general pattern: This will provide auto-suggestions for Subjects, ChildrensSubjects, GenreForms, and Names: http://id.loc.gov/authorities/suggest/?q=Hounds This will provide auto-suggestions for Subjects only (replace subjects with names for only names and so on): http://id.loc.gov/authorities/subjects/suggest/?q=Hounds Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Michael J. Giarlo Sent: Tuesday, June 04, 2013 8:05 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Subject Headings API How about id.loc.gov's OpenSearch-powered autosuggest feature? mjg@moby:~$ curl http://id.loc.gov/authorities/suggest/?q=Biology [Biology,[Biology,Biology Colloquium,Biology Curators' Group,Biology Databook Editorial Board (U.S.),Biology and Earth Sciences Teaching Institute,Biology and Management of True Fir in the Pacific Northwest Symposium (1981 : Seattle, Wash.),Biology and Resource Management Program (Alaska Cooperative Park Studies Unit),Biology and behavior series,Biology and environment (Macmillan Press),Biology and management of old-growth forests],[1 result,1 result,1 result,1 result,1 result,1 result,1 result,1 result,1 result,1 result],[http://id.loc.gov/authorities/subjects/sh85014203,; http://id.loc.gov/authorities/names/n79006962,; http://id.loc.gov/authorities/names/n90639795,; http://id.loc.gov/authorities/names/n85100466,; http://id.loc.gov/authorities/names/nr97041787,; http://id.loc.gov/authorities/names/n85276541,; http://id.loc.gov/authorities/names/n82057525,; http://id.loc.gov/authorities/names/n90605518,; http://id.loc.gov/authorities/names/nr2001011448,; http://id.loc.gov/authorities/names/no94028058;]] -Mike On Tue, Jun 4, 2013 at 7:51 PM, Joshua Welker jwel...@sbuniv.edu wrote: I did see that, and it will work in a pinch. But the authority file is pretty massive--almost 1GB-- and would be difficult to handle in an automated way and without completely killing my web app due to memory constraints while searching the file. Thanks, though. Josh Welker -Original Message- From: Bryan Baldus [mailto:bryan.bal...@quality-books.com] Sent: Tuesday, June 04, 2013 6:39 PM To: Code for Libraries; Joshua Welker Subject: RE: LOC Subject Headings API On Tuesday, June 04, 2013 6:31 PM, Joshua Welker [jwel...@sbuniv.edu] wrote: I am building an auto-suggest feature into our library's search box, and I am wanting to include LOC subject headings in my suggestions list. Does anyone know of any web service that allows for automated harvesting of LOC Subject Headings? I am also looking for name authorities, for that matter. Any format will be acceptable to me: RDF, XML, JSON, HTML, CSV... I have spent a while Googling with no luck, but this seems like the sort of general-purpose thing that a lot of people would be interested in. I feel like I must be missing something. Any help is appreciated. Have you seen http://id.loc.gov/ with bulk downloads in various formats at http://id.loc.gov/download/ I hope this helps, Bryan Baldus Senior Cataloger Quality Books Inc. The Best of America's Independent Presses 1-800-323-4241x402 bryan.bal...@quality-books.com eij...@cpan.org http://home.comcast.net/~eijabb/
Re: [CODE4LIB] LOC Subject Headings API
This would work, except I would need a way to get all the subjects rather than just biology. -- If you want all the subjects. [period], take a look at the download page: http://id.loc.gov/download/ There are bulk downloads for LCSH and the LC/NACO file of Names. The suggest service (described in a separate email) is designed to give you the top 10 best matches based on a left-anchored search, so that it may function as a real-time type-ahead service. Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joshua Welker Sent: Wednesday, June 05, 2013 9:14 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Subject Headings API This would work, except I would need a way to get all the subjects rather than just biology. Any idea how to do that? I tried removing the querystring from the URL and changing Biology in the URL to with no success. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Michael J. Giarlo Sent: Tuesday, June 04, 2013 7:05 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Subject Headings API How about id.loc.gov's OpenSearch-powered autosuggest feature? mjg@moby:~$ curl http://id.loc.gov/authorities/suggest/?q=Biology [Biology,[Biology,Biology Colloquium,Biology Curators' Group,Biology Databook Editorial Board (U.S.),Biology and Earth Sciences Teaching Institute,Biology and Management of True Fir in the Pacific Northwest Symposium (1981 : Seattle, Wash.),Biology and Resource Management Program (Alaska Cooperative Park Studies Unit),Biology and behavior series,Biology and environment (Macmillan Press),Biology and management of old-growth forests],[1 result,1 result,1 result,1 result,1 result,1 result,1 result,1 result,1 result,1 result],[http://id.loc.gov/authorities/subjects/sh85014203,; http://id.loc.gov/authorities/names/n79006962,; http://id.loc.gov/authorities/names/n90639795,; http://id.loc.gov/authorities/names/n85100466,; http://id.loc.gov/authorities/names/nr97041787,; http://id.loc.gov/authorities/names/n85276541,; http://id.loc.gov/authorities/names/n82057525,; http://id.loc.gov/authorities/names/n90605518,; http://id.loc.gov/authorities/names/nr2001011448,; http://id.loc.gov/authorities/names/no94028058;]] -Mike On Tue, Jun 4, 2013 at 7:51 PM, Joshua Welker jwel...@sbuniv.edu wrote: I did see that, and it will work in a pinch. But the authority file is pretty massive--almost 1GB-- and would be difficult to handle in an automated way and without completely killing my web app due to memory constraints while searching the file. Thanks, though. Josh Welker -Original Message- From: Bryan Baldus [mailto:bryan.bal...@quality-books.com] Sent: Tuesday, June 04, 2013 6:39 PM To: Code for Libraries; Joshua Welker Subject: RE: LOC Subject Headings API On Tuesday, June 04, 2013 6:31 PM, Joshua Welker [jwel...@sbuniv.edu] wrote: I am building an auto-suggest feature into our library's search box, and I am wanting to include LOC subject headings in my suggestions list. Does anyone know of any web service that allows for automated harvesting of LOC Subject Headings? I am also looking for name authorities, for that matter. Any format will be acceptable to me: RDF, XML, JSON, HTML, CSV... I have spent a while Googling with no luck, but this seems like the sort of general-purpose thing that a lot of people would be interested in. I feel like I must be missing something. Any help is appreciated. Have you seen http://id.loc.gov/ with bulk downloads in various formats at http://id.loc.gov/download/ I hope this helps, Bryan Baldus Senior Cataloger Quality Books Inc. The Best of America's Independent Presses 1-800-323-4241x402 bryan.bal...@quality-books.com eij...@cpan.org http://home.comcast.net/~eijabb/
Re: [CODE4LIB] LOC Subject Headings API
it looks like LCSH is moving past this string-based hierarchy in favor of one expressed in terms of linked data. -- Oh, I've never received that impression. Pre-coordination - which you referred to as hierarchical sets of terms - is alive and well. A number of studies were done in the second half of the 2000s that looked at the creation of LCSH headings. Pre-coordination received significant attention in these studies and was ultimately confirmed as a good thing. Who knows why the precoordinated heading that was once used for Mexican War, 1846-1848 was replaced, but that probably happened in 1986 (or 1991) based on the creation and most-resent modification times on that record. In other words, at a time when the notion of Linked Data was non-existent. Yours, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ethan Gruber Sent: Wednesday, June 05, 2013 9:41 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Subject Headings API Are you referring to hierarchical sets of terms, like United States-- History--War with Mexico, 1845-1848? This is an earlier established term of http://id.loc.gov/authorities/subjects/sh85140201 (now labeled Mexican War, 1846-1848). Ed Summers or Kevin Ford are in a better position to discuss the change of terminology, but it looks like LCSH is moving past this string-based hierarchy in favor of one expressed in terms of linked data. Ethan On Wed, Jun 5, 2013 at 9:32 AM, Joshua Welker jwel...@sbuniv.edu wrote: I've seen those, but I can't figure out where on the id.loc.gov site there is actually a URL that provides a list of authority terms. All the links on the site seem to link to other pages within the site. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 04, 2013 6:42 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Subject Headings API Joshua, There are different formats at LOC: http://id.loc.gov/authorities/subjects.html dana On Tue, Jun 4, 2013 at 6:31 PM, Joshua Welker jwel...@sbuniv.edu wrote: I am building an auto-suggest feature into our library's search box, and I am wanting to include LOC subject headings in my suggestions list. Does anyone know of any web service that allows for automated harvesting of LOC Subject Headings? I am also looking for name authorities, for that matter. Any format will be acceptable to me: RDF, XML, JSON, HTML, CSV... I have spent a while Googling with no luck, but this seems like the sort of general-purpose thing that a lot of people would be interested in. I feel like I must be missing something. Any help is appreciated. Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624 -- Dana Pearson dbpearsonmlis.com
[CODE4LIB] K Class added to ID.LOC.GOV
The Library of Congress is pleased to make the K Class - Law Classification - and all its subclasses available as linked data from LC's Linked Data Service, ID.LOC.GOV. K Class joins the B, N, M, and Z Classes released in June 2012. With about 2.2 million new resources added to ID.LOC.GOV, K Class is nearly eight times larger than the B, M, N, and Z Classes combined. It is four times larger than LCSH. If it is not the largest class, it is second only to the P Class (Literature) in the Library of Congress Classification system. We have also taken the opportunity to re-compute and reload the B, M, N, and Z classes in response to a few reported errors. Our gratitude to Caroline Arms for her work crawling through B, M, N, and Z and identifying a number of these issues. The classification section of ID.LOC.GOV remains a beta offering. More work is needed not only to add the additional classes to the system but also to continue to work out issues with the data. We continue to encourage the submission of use cases describing how users would like to utilize the LCC data. ** Please explore the K Class for yourself at http://id.loc.gov/authorities/classification/K or all of the classes at http://id.loc.gov/authorities/classification ** Contact Us about ID: As always, your feedback is important and welcomed. Though we are interested in all forms of constructive commentary on all topics related to ID, we're particularly interested in how the data available from ID.LOC.GOV is used. Your contributions directly inform service enhancements. You can send comments or report any problems to us via the ID feedback form or ID listserv (see the web site). Background: The LC Linked Data Service was first made available in May 2009 and offered the Library of Congress Subject Headings (LCSH), the Library's initial entry into the Linked Data environment. In part by assigning each vocabulary and each data value within it a unique resource identifier (URI), the service provides a means for machines to semantically access, use, and harvest authority and vocabulary data that adheres to W3C recommendations, such as Simple Knowledge Organization System (SKOS), and the more detailed vocabulary MADS/RDF. In this way, the LC Linked Data Service also makes government data publicly and freely available in the spirit of the Open Government directive. Although the primary goal of the service is to enable machine access to Library of Congress data, a web interface serves human users searching and browsing the vocabularies. The new datasets join the term and code lists already available through the service: * Library of Congress Subject Headings (LCSH) * Library of Congress Children's Subject Headings * Library of Congress Genre/Form Terms * Library of Congress / NACO Name Authority File * Thesaurus of Graphic Materials * MARC Code List for Relators * MARC Code List for Countries (which reference their equivalent ISO 3166 codes) * MARC Code List for Geographic Areas * MARC Code List for Languages (which have been cross referenced with ISO 639-1, 639-2, and 639-5, where appropriate) * PREMIS vocabularies for Cryptographic Hash Functions, Preservation Events, and Preservation Level Roles The above code lists also contain links with appropriate LCSH and LC/NAF headings. LC's Linked Data Service is managed by the Network Development and MARC Standards Office of the Library of Congress. -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC
[CODE4LIB] LC Systems Maintenance this coming weekend (9-12 November)
All Library of Congress systems will be taken offline beginning Friday evening. This includes LCCN Permalink, Z39.50 and SRU services, ID.LOC.GOV, all listservs, and, of course, the catalog. *All* Library systems. Service will be restored by Tuesday. The Library of Congress has planned extensive electrical work and power maintenance for this coming weekend. As a protective measure, all Library systems will be powered down. The maintenance period is scheduled for completion by Tuesday morning, when it is expected all Library systems will have been restored to normal operation. Though it is anticipated work will not be fully completed until late Monday (or very early Tuesday morning), services will be start coming back online many hours before then. We regret any inconvenience this may cause. Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC
Re: [CODE4LIB] haititrust
Ideally, you shouldn't need the hathifiles. The HathiTrust search page links to an OpenSearch document [1], which promisingly identifies an RSS feed and a JSON serialization of the search results. Neither appears to work. In theory, doing as Jon says and then appending view=rss would get you an RSS feed. There is a contact email in the OpenSearch document you might try. FWIW, if you look at the search page HTML, there is a fixme note in an HTML comment, the same comment, incidentally, that also comments out the RSS feed link in the HTML. Yours, Kevin [1] http://catalog.hathitrust.org/Search/OpenSearch?method=describe -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jon Stroop Sent: Friday, August 03, 2012 11:15 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] haititrust You can do an empty query in their catalog, and use the Original Location facet to filter to a holding library. Programatically, I'm not sure, but you'd probably need to use the Hathi files: http://www.hathitrust.org/hathifiles. -Jon On 08/03/2012 11:07 AM, Eric Lease Morgan wrote: If I needed/wanted to know what materials held by my library were also in the HaitTrust, then programmatically how could I figure this out? In other words, do you know of a way to query the HaitTrust and limit the results to items my library owns? --Eric Lease Morgan
[CODE4LIB] 4 LCC Classes added to LC Linked Data Service
Announcement: 4 LC Classification Classes added to LC Linked Data Service The Library of Congress is pleased to make available the B, M, N, and Z Classes of the Library of Congress Classification (LCC) from the Library's Linked Data Service (ID.LOC.GOV). This effort not only provides URIs and resources for LCC schedules and tables required to synthesize a classification number, but also Linked Data resources for each derivable classification number and classification range within the entire Class hierarchy. A small ontology has been developed to accurately represent the semantics for LCC data; it will follow shortly. The publication of these LCC classes as Linked Data is presently a beta offering. As such, this announcement is limited to those groups and users who will most benefit from this offering, and from whom we anticipate we will likely receive the most valuable feedback at this time. Because LCC is sufficiently different from the other data available at ID.LOC.GOV, notably LCSH and Names, it is anticipated that more time and user feedback will be needed to fully work out any remaining issues and to maximize the data's usability. Indeed, we encourage the submission of uses cases about how users would like to use the data. Because this is an area of active development, no bulk download of these classes are being published at this time. On the other hand, it is hoped that more LCC Classes will be added more quickly in the near future. ** Please explore it for yourself at http://id.loc.gov/authorities/classification ** Contact Us about ID: As always, your feedback is important and welcomed. Though we are interested in all forms of constructive commentary on all topics related to ID, we're particularly interested in how the data available from ID.LOC.GOV is used. Your contributions directly inform service enhancements. You can send comments or report any problems to us via the ID feedback form or ID listserv (see the web site). Background: The LC Linked Data Service was first made available in May 2009 and offered the Library of Congress Subject Headings (LCSH), the Library's initial entry into the Linked Data environment. In part by assigning each vocabulary and each data value within it a unique resource identifier (URI), the service provides a means for machines to semantically access, use, and harvest authority and vocabulary data that adheres to W3C recommendations, such as Simple Knowledge Organization System (SKOS), and the more detailed vocabulary MADS/RDF. In this way, the LC Linked Data Service also makes government data publicly and freely available in the spirit of the Open Government directive. Although the primary goal of the service is to enable machine access to Library of Congress data, a web interface serves human users searching and browsing the vocabularies. The new datasets join the term and code lists already available through the service: * Library of Congress Subject Headings (LCSH) * Library of Congress Children's Subject Headings * Library of Congress Genre/Form Terms * Library of Congress / NACO Name Authority File * Thesaurus of Graphic Materials * MARC Code List for Relators * MARC Code List for Countries (which reference their equivalent ISO 3166 codes) * MARC Code List for Geographic Areas * MARC Code List for Languages (which have been cross referenced with ISO 639-1, 639-2, and 639-5, where appropriate) * PREMIS vocabularies for Cryptographic Hash Functions, Preservation Events, and Preservation Level Roles The above code lists also contain links with appropriate LCSH and LC/NAF headings. -- Kevin Ford Network Development MARC Standards Office Library of Congress Washington, DC
[CODE4LIB] MARC Magic for file
I finally had occasion today (read: remembered) to see if the *nix file command would recognize a MARC record file. I haven't tested extensively, but it did identify the file as MARC21 Bibliographic record. It also correctly identified a MARC21 Authority Record. I'm running the most recent version of Ubuntu (12.04 - precise pangolin). I write because the inclusion of a file MARC21 specification rule in the magic.db stems from a Code4lib exchange that started in March 2011 [1] (it ends in April if you want to go crawling for the entire thread). Rgds, Kevin [1] https://listserv.nd.edu/cgi-bin/wa?A2=ind1103L=CODE4LIBT=0F=S=P=112728 -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC
Re: [CODE4LIB] MARC Magic for file
Does it work for bulk files? -- It passed on a file containing 215 MARC Bibs and on a file containing 2,574 MARC Auth records. Don't know if you consider these bulk, but there is more than 1 record in each file (caveat: file stops after evaluating the first line, so of the 2,574 Auth records, the last 2,573 could be invalid). It failed on a file containing all of LC Classification. I need to figure out why. Kevin, do you have examples of the output? -- I received MARC21 Bibliography and MARC21 Authority respectively. In theory, if Leader 20-23 are not 4500 then (non-conforming) should be appended to the identification. If requested, the mimetype - application/marc - should also be outputted. Rgds, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ross Singer Sent: Wednesday, May 23, 2012 3:29 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC Magic for file Wow, this is pretty cool. Kevin, do you have examples of the output? Does it work for bulk files? I mean, I could just try this on my Ubuntu machine, but it's all the way downstairs... -Ross. On May 23, 2012, at 3:14 PM, Ford, Kevin wrote: I finally had occasion today (read: remembered) to see if the *nix file command would recognize a MARC record file. I haven't tested extensively, but it did identify the file as MARC21 Bibliographic record. It also correctly identified a MARC21 Authority Record. I'm running the most recent version of Ubuntu (12.04 - precise pangolin). I write because the inclusion of a file MARC21 specification rule in the magic.db stems from a Code4lib exchange that started in March 2011 [1] (it ends in April if you want to go crawling for the entire thread). Rgds, Kevin [1] https://listserv.nd.edu/cgi- bin/wa?A2=ind1103L=CODE4LIBT=0F=S=P=1 12728 -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC
Re: [CODE4LIB] Author authority records to create publication feed?
Hi Paul, I can't really offer any suggestions but to say that this is a problem area presently. In fact, there was a recent workshop, held in connection with the Spring CNI Membership Meeting, designed specifically to look at this problem (and author identity management more generally). You can read more about it from the announcement here [1], but the idea was to bring a number of the larger actors (Web of Science, arXiv, ORCID, ISNI, VIAF, LC/NACO, and a few more) involved in managing authorial identity together to learn about the work being done, and to discuss improved ways, to disambiguate scholarly identities and then diffuse and share that information within and across the library and scholarly publishing realms. Clifford Lynch, who moderated the meeting, will publish a post-workshop report in a few weeks [2]. Perhaps of additional interest, [2] also contains a link to the report of a similar workshop held in London about international author identity. Inititatives like ISNI [3] and ORCID [4], which mint identifiers for (public, authorial) identities, and VIAF, which has done so much to aggregate the authority records of the participating libraries (while also assigning them an identifier), are essential to disambiguating one identity from another and assigning unique identifiers to those identities. For identifiers like ORCIDs, the faculty member's sponsoring organization might acquire the ORCID for him/her, after which the faculty member will/may know and use the identifier in situations such as grant applications, publishing, etc. (though it might also be early days for this activity also). Part of the process, however, is diffusing the identifier across the library and scholarly publishing domains, all the while matching it with the correct identity (and identifer) in another system. That said, when ISNIs and ORCIDs and, perhaps, VIAF identifiers start to make their ways into Web of Science, arXiv, LC/NACO file, an! d many other places, we - developers looking to creating RSS feeds of author publications across services but without having to deal with same-name problems or variants - might then have the hook we need to generate RSS feeds for author publications from such services as JSTOR, EBSCO, arXiv, Web Of Science, etc. Alternatively, you'd have to get your faculty members to submit their entire publication history to academia.edu (as Ethan suggested), after which the community would have to request an RSS feed of that history, or an institutional repository (as Chad suggested), but I understand these types of things are an uphill battle with (often busy, underpaid) faculty. Cordially, Kevin [1] http://www.cni.org/news/cni-workshop-scholarly-id/ [2] https://mail2.cni.org/Lists/CNI-ANNOUNCE/Message/113744.html [3] http://www.isni.org/ [4] http://about.orcid.org/ -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Paul Butler (pbutler3) Sent: Friday, April 13, 2012 9:25 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Author authority records to create publication feed? Howdy All, Some folks from across campus just came to my door with this question. I am still trying to work through the possibilities and problems, but thought others might have encountered something similar. They are looking for a way to create a feed (RSS, or anything else that might work) for each faculty member on campus to collect and link to their publications, which can then be embedded into their faculty profile webpage (in WordPress). I realize the vendors (JSTOR, EBSCO, etc.) allow author RSS feeds, but that really does not allow for disambiguation between folks with the same name and variants in name citation. It appears Web of Science has author authority records and a set of apis, but we currently do not subscribe to WoS and am waiting for a trial to test. What we need is something similar to this: http://arxiv.org/help/author_identifiers We can ask faculty members to upload their own citations and then just auto link out to something like Serials Solutions' Journal Finder, but that is likely not sustainable. So, any suggestions - particularly free or low cost solutions. Thanks! Cheers, Paul +-+-+-+-+-+-+-+-+-+-+-+-+ Paul R Butler Assistant Systems Librarian Simpson Library University of Mary Washington 1801 College Avenue Fredericksburg, VA 22401 540.654.1756 libraries.umw.edu Sent from the mighty Dell Vostro 230.
[CODE4LIB] Bulk Download of Names Available
Bulk downloads of the Library of Congress *Name* Authority File (NAF) are now available. The current bulk download is only MADS/RDF. We'll make a SKOS/RDF download available in the near future. We are offering two serializations: n-triples and RDF/XML. The LC *Subject* Heading (LCSH) file continues to be available for download as SKOS/RDF, but now LCSH is also available in MADS/RDF (in the same serializations). The MADS/RDF enables the identification of types for subject headings and subheadings. They may be downloaded here: http://id.loc.gov/download/ The data dumps are very much a work in progress. Please report problems, issues, and wishes on the ID.LOC.GOV listserv: http://listserv.loc.gov/cgi-bin/wa?SUBED1=IDA=1 -- Kevin Ford Digital Project Coordinator Network Development MARC Standards Office Library of Congress 101 Independence Avenue, SE Washington, DC 20540-4402 Email: k...@loc.gov Tel: 202 707 3526
[CODE4LIB] Names Added to ID.LOC.GOV
Announcement: New Vocabulary Data Added to LC Authorities and Vocabularies Service The Library of Congress is pleased to make available additional vocabularies from its Authorities and Vocabularies web service (ID.LOC.GOV), which provides access to Library of Congress standards and vocabularies as Linked Data. The new dataset is: * Library of Congress Name Authority File (LC/NAF) In addition, the service has been enhanced to provide separate access to the following datasets which have been a part of the LCSH dataset access: * Library of Congress Genre/Form Terms * Library of Congress Children's Headings The LC/NAF data are published in RDF using the MADS/RDF and SKOS/RDF vocabularies, as are the other datasets. Individual concepts are accessible at the ID.LOC.GOV web service via a web browser interface or programmatically via content-negotiation. The vocabulary data are available for bulk download in MADS and SKOS RDF (the Name file and main LCSH file will be available by Friday, August 12). **Please explore it for yourself at http://id.loc.gov. ** Contact Us about ID: As always, your feedback is important and welcomed. Though we are interested in all forms of constructive commentary on all topics related to ID, we're particularly interested in how the data available from ID.LOC.GOV is used. Your contributions directly inform service enhancements. The addition of Names has resulted in considerable changes to the ID.LOC.GOV backend. Although we have endeavored to bring the service up with all pieces in place, please be patient as we work out any remaining kinks. You can send comments or report any problems to us via the ID feedback form or ID listserv (see the web site). Background: The Authorities and Vocabularies web service was first made available in May 2009 and offered the Library of Congress Subject Headings (LCSH), the Library's initial entry into the Linked Data environment. In part by assigning each vocabulary and each data value within it a unique resource identifier (URI), the service provides a means for machines to semantically access, use, and harvest authority and vocabulary data that adheres to W3C recommendations, such as Simple Knowledge Organization System (SKOS), and the more detailed vocabulary MADS/RDF. In this way, the Authorities and Vocabularies web service also makes government data publicly and freely available in the spirit of the Open Government directive. Although the primary goal of the service is to enable machine access to Library of Congress data, a web interface serves human users searching and browsing the vocabularies. The new datasets join the term and code lists already available through the service: * Library of Congress Subject Headings (LCSH) * Thesaurus of Graphic Materials * MARC Code List for Relators * MARC Code List for Countries (which reference their equivalent ISO 3166 codes) * MARC Code List for Geographic Areas * MARC Code List for Languages (which have been cross referenced with ISO 639-1, 639-2, and 639-5, where appropriate) * PREMIS vocabularies for Cryptographic Hash Functions, Preservation Events, and Preservation Level Roles The above code lists also contain links with appropriate LCSH and LC/NAF headings. Additional vocabularies will be added in the future, including additional PREMIS controlled vocabularies. -- Kevin Ford Digital Project Coordinator Network Development MARC Standards Office Library of Congress 101 Independence Avenue, SE Washington, DC 20540-4402 Email: k...@loc.gov Tel: 202 707 3526
Re: [CODE4LIB] TIFF Metadata to XML?
Exiftool [1] and trusty ImageMagick [2] will work. With ImageMagick it is as easy as: convert image.tiff image.xmp Members of the Visual Resources Association (VRA) have been working on/with embedded metadata for a few years now. There may be something more to glean from the working group's wiki [3]. Cordially, Kevin [1] http://www.sno.phy.queensu.ca/~phil/exiftool/ [2] http://www.imagemagick.org/script/index.php [3] http://metadatadeluxe.pbworks.com/w/page/20792238/FrontPage -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward M. Corrado Sent: Monday, July 18, 2011 9:18 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] TIFF Metadata to XML? Hello All, Before I re-invent the wheel or try many different programs, does anyone have a suggestion on a good way to extract embedded Metadata added by cameras and (more importantly) photo-editing programs such as Photoshop from TIFF files and save it as as XML? I have 60k photos that have metadata including keywords, descriptions, creator, and other fields embedded in them and I need to extract the metadata so I can load them into our digital archive. Right now, after looking at a few tools and having done a number of Google searches and haven't found anything that seems to do what I want. As of now I am leaning towards extracting the metadata using exiv2 and creating a script (shell, perl, whatever) to put the fields I need into a pseudo-Dublin Core XML format. I say pseudo because I have a few fields that are not Dublin Core. I am assuming there is a better way. (Although part of me thinks it might be easier to do that then exporting to XML and using XSLT to transform the file since I might need to do a lot of cleanup of the data regardless.) Anyway, before I go any further, does anyone have any thoughts/ideas/suggestions? Edward
Re: [CODE4LIB] source of marc geographic code?
The GeographicArea codes have been available from [1] in XML [2] since at least late 2007 [3]. I can't say with 100% certainty that the XML structure has remained perfectly consistent since 2007, but eyeballing the 2007 version and comparing it to currently available file suggests that the structure has remained consistent. The GACS codes are also available from ID, as has been pointed out. The entire list is available for download at [4]. Let me acknowledge, though, that the labels for the URIs (incidentally, the GACS code is the last token of the URI) are not part of the RDF/N-triples/JSON at [5]. This sounds like a feature request - and a useful one at that. Would that be an accurate interpretation of this thread? Cordially, Kevin -- Network Development MARC Standards Office [1] http://www.loc.gov/marc/geoareas/gacshome.html [2] http://www.loc.gov/standards/codelists/gacs.xml [3] http://web.archive.org/web/20071129170212/http://www.loc.gov/marc/geoareas/ [4] http://id.loc.gov/download/ [5] http://id.loc.gov/vocabulary/geographicAreas.html From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind [rochk...@jhu.edu] Sent: Wednesday, June 22, 2011 21:43 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] source of marc geographic code? The result was that a few meetings later LC announced that they had coded the MARC online pages in XML, and were generating the HTML from that. I think I was mis-understood. No doubt, but man if they'd then just SHARE that XML with us at a persistent URL, and keep the structure of that XML the same, that'd be really useful!
Re: [CODE4LIB] LCSH and Linked Data
Actually, it appears to depend on whose Authority record you're looking at. The Canadians, Australians, and Israelis have it as a CorporateName (110), as do the French (210 - unimarc); LC and the Germans say it's a Geographic Name. In the case of LCSH, therefore, it would be a 151. Regardless, it is in VIAF. Warmly, Kevin From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of LeVan,Ralph [le...@oclc.org] Sent: Thursday, April 07, 2011 11:34 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] MARC magic for file
I couldn't get Simon's MARC 21 Magic file to work. Among other issues, I received line too long errors. But, since I've been curious about this for sometime, I figured I'd take a whack at it myself. Try this: # # MARC 21 Magic (Second cut) # Set at position 0 0 short 0x # leader ends with 4500 20 string 4500 # leader starts with 5 digits, followed by codes specific to MARC format 0 regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z] MARC Bibliographic 0 regex/1 (^[0-9]{5})[acdnosx][z] MARC Authority 0 regex/1 (^[0-9]{5})[cdn][uvxy] MARC Holdings 0 regex/1 (^[0-9]{5})[acdn][w]MARC Classification 0 regex/1 (^[0-9]{5})[cdn][q] MARC Community I've also attached it to this email to preserve the tabs. In any event, I can confirm it works on MARC Bib, MARC Authority, and MARC Classification files I have bumping around my computer. I've not tested it on MARC Holdings and MARC Community. Do let us/me know if it works for you (and the community generally). I can see about submitting it for formal inclusion in the magic file. Warmly, Kevin -- Library of Congress Network Development and MARC Standards Office From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero [s...@unc.edu] Sent: Thursday, March 24, 2011 12:28 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file Some of the problems in your first cut are: 1. Offsets for regex are given in terms of lines. MARC files don't have newlines in them, unless you're Millennium, in which case they can be inserted every 200,000 bytes to keep things interesting. 2. Byte matches match byte values, so 20 byte 4 is looking for the binary value, not the ascii digit. 3. Sometimes you need to prime the buffer before you can do a regexp match. Is this good enough? # MARC 21 Magic (First cut) # indicator count must be 2 10 string 2 # leader must end in 4500 20 string 4500 # leader must start with five digits, a record status, and a record type 0 regex ^([0-9]{5})[acdnp][acdefgijkmoprt][abcims] MARC Bibliographic 0 regex ^([0-9]{5})[acdnp][z] MARC Authority Simon On Wed, Mar 23, 2011 at 8:09 PM, William Denton w...@pobox.com wrote: Has anyone figured out the magic necessary for file to recognize MARC files? If you don't know it, file is a Unix command that tells you what kind of file a file is. For example: $ file 101015_001.mp3 101015_001.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 192 kbps, 44.1 kHz, Stereo $ file P126.jpg P126.jpg: JPEG image data, EXIF standard, comment: AppleMark It's a really useful command. I assume it's on OSX, but I don't know. You can get it for Windows with Cygwin. The problem is, file doesn't grok MARC: $ file catalog.01.mrc catalog.01.mrc: data I took a stab at getting the magic defined, but it didn't work. I'll include what I used below. You can put it into a magic.txt file, and then use file -m magic.txt some_file.mrc to test it. It'll tell you the file is MARC Bibliographic ... but it also thinks that PDFs, JPEGs, and text files are MARC. That's no good. It'd be great if the MARC magic got into the central magic database so everyone would be able to recognize various MARC file types. Bill # --- clip'n'test # MARC 21 for Bibliographic Data # http://www.loc.gov/marc/bibliographic/bdleader.html # # This doesn't work properly 0 stringx 5regex [acdnp] 6regex [acdefgijkmoprt] 7regex [abcims] 8regex [\ a] 9regex [\ a] 10 byte x 11 byte x 12 stringx 17 regex [\ 12345678uz] 18 regex [\ aciu] 19 regex [\ abc] MARC Bibliographic #20 byte 4 #21 byte 5 #22 byte 0 #23 byte 0 MARC Bibliographic # --- end clip'n'test -- William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org marc.magic Description: marc.magic
[CODE4LIB] New Vocabs Added to ID.LOC.GOV
Announcement: New Vocabularies Added to LC Authorities and Vocabularies Service The Library of Congress is pleased to make available new vocabularies from its Authorities and Vocabularies web service (ID.LOC.GOV), which provides access to Library of Congress standards and vocabularies as Linked Data. The new additions include : MARC Code List for Countries MARC Code List for Geographic Areas MARC Code List for Languages The MARC Countries entries include references to their equivalent ISO 3166 codes. The MARC Languages have been cross referenced with ISOs 639-1, 639-2, and 639-5, where appropriate. Additional vocabularies will be added in the future, including additional PREMIS controlled vocabularies. The vocabulary data are published in RDF using the SKOS/RDF Vocabulary. Individual concepts are accessible via the ID.LOC.GOV web service via a web browser interface or programmatically via content-negotiation. The vocabulary data are also available for bulk download. A new bulk download of LCSH will be available tomorrow, 5 January 2011. As always, your feedback is important and welcomed. Though we are interested in all forms of constructive commentary on all topics related to ID, we're particularly interested in how the data available from ID.LOC.GOV is used. Your contributions directly inform service enhancements. The Authorities and Vocabularies web service was first made available in May 2009 and offered the Library of Congress Subject Headings (LCSH), the Library's initial entry into the Linked Data movement. In part by assigning each vocabulary and each data value within it a unique resource identifier (URI), the service provides a means for machines to semantically access, use, and harvest authority and vocabulary data that adheres to W3C recommendations, such as Simple Knowledge Organization System (SKOS). In this way, the Authorities and Vocabularies web service also makes government data publicly and freely available in the spirit of the Open Government directive. Although the primary goal of the service is to enable machine access to Library of Congress data, a web interface serves human users searching and browsing the vocabularies. Please explore it for yourself at http://id.loc.gov. * Kevin M. Ford Digital Project Coordinator Network Development MARC Standards Office Library of Congress 101 Independence Avenue, SE Washington, DC 20540-4402
[CODE4LIB] MADS/RDF for review
Announcement: MADS/RDF for review A MADS/RDF ontology developed at the Library of Congress is available for a public review period until Jan. 14, 2011. The MADS/RDF (Metadata Authority Description Schema in RDF) vocabulary is a data model for authority and vocabulary data used within the library and information science (LIS) community, which is inclusive of museums, archives, and other cultural institutions. It is presented as an OWL ontology. Documentation and the ontology are available at: http://www.loc.gov/standards/mads/rdf/ Based on the MADS/XML schema, MADS/RDF provides a means to record data from the Machine Readable Cataloging (MARC) Authorities format in RDF for use in semantic applications and Linked Data projects. MADS/RDF is a knowledge organization system designed for use with controlled values for names (personal, corporate, geographic, etc.), thesauri, taxonomies, subject heading systems, and other controlled value lists. It is closely related to SKOS, the Simple Knowledge Organization System and a widely supported and adopted RDF vocabulary. Unlike SKOS, however, which is very broad in its application, MADS/RDF is designed specifically to support authority data as used by and needed in the LIS community and its technology systems. Given the close relationship between the aim of MADS/RDF and the aim of SKOS, the MADS ontology has been fully mapped to SKOS. Community feedback is encouraged and welcomed. The MODS listserv - MADS/XML is maintained as part of the community work on MODS (Metadata Object Description Schema) - is the preferred forum for feedback: http://listserv.loc.gov/listarch/mods.html (send mail to: m...@listserv.loc.gov). Kevin Ford, the primary architect of the model, will be responding on that forum in order to have an open discussion. * Kevin M. Ford Digital Project Coordinator Network Development MARC Standards Office Library of Congress 101 Independence Avenue, SE Washington, DC 20540-4402
Re: [CODE4LIB] dc:identifier in Google XML
Dear David, I believe they're codes for universities. UCSC is probably Univ of Calif Santa Cruz. UOM is University of Michigan. (You'll see STANFORD and OCLC in the results also, though OCLC is not a university). I tracked one of items in the ATOM feed to the UM record: http://mirlyn.lib.umich.edu/Record/000680081/Details#tabs The ID you see in the ATOM feed is buried in one of the 974 fields of the UM MARC record. HTH, Kevin From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of David Kane [dk...@wit.ie] Sent: Monday, July 19, 2010 12:55 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] dc:identifier in Google XML HI All, I am getting data from google books that I do not understand in the dc:identifier field. I understand ISBN: ISSN: LCCN: OCLC: but UOM:, and UCSC:? Can anyone help with what these two mean. Are they Universities? Here is a snippet of xml; dc:formatbook/dc:format dc:identifierr0xMMAAJ/dc:identifier dc:identifierUOM:39015035700759/dc:identifier dc:subjectMedical/dc:subject dc:titleAbstracts [of the] annual meeting/dc:title ... generated from this URL: http://www.google.com/books/feeds/volumes?q=Abstracts%20of%20the%20annual%20meeting Thanks, David. -- David Kane, MLIS. Systems Librarian Waterford Institute of Technology Ireland http://library.wit.ie/ T: ++353.51302838 M: ++353.876693212
Re: [CODE4LIB] Any web services that can help sort out this for me.
Following on Dave's recommendation, you could also use Google Books' Data API [1]. Search for the book, get a structured ATOM feed as a response, presume the first hit is your book, and then follow the ATOM feed link for that books' metadata. It isn't going to be perfect; I'd be interested to know the end ratio of perfect versus missed matches. Good luck, Kevin [1] http://code.google.com/apis/books/docs/gdata/developers_guide_protocol.html From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Dave Caroline [dave.thearchiv...@gmail.com] Sent: Thursday, June 17, 2010 5:43 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Any web services that can help sort out this for me. what definition of large list 10,100,1000,. yes google copy title part Progress in Smart Materials and Structures paste in google box press return first hit for the first line has the isbn, or you could script it and use the Open Library API and get the isbn back possibly Dave Caroline On Thu, Jun 17, 2010 at 9:59 AM, David Kane dk...@wit.ie wrote: Hi, I have large amounts of data like this: yawn Reece, P. L., (2006), Progress in Smart Materials and Structures, Nova Ghosh, S. K., (2008), Self-healing materials: fundamentals, design strategies and applications, Wiley A.Y.K. Chan, Biomedical Device Technology: Principles Design, Charles C. Thomas, 2008. L.J. Street, Introduction to Biomedical Engineering Technology, CRC Press, 2007. /yawn ... one book per line. they are not in any order. I am lazy. So, is there a web service out there that I can throw this stuff at to organise it for me and ideally find the ISBNs. Long shot, I know. But thanks, David. -- David Kane Systems Librarian Waterford Institute of Technology Ireland http://library.wit.ie/ davidfk...@googlewave.com T: ++353.51302838 M: ++353.876693212
Re: [CODE4LIB] XForms EAD editor sandbox available
We've been using Orbeon forms for about a year now for cataloging our digital collections. We use Fedora Commons, so using the XML as input and outputting to XML seemed a no brainer. It has worked very nicely for editing VRA Core4 records. But, instead of doing anything terribly fancy with Orbeon, we simply use the little sandbox application that comes with Orbeon (there's an online demo [1]). The URL to the XForm is part of the query string. This solution has greatly reduced our time investment in making Orbeon part of our workflow and, more importantly, getting Orbeon to work for us. All that being said, Ethan's sharp looking EAD editor makes me jealous that we haven't created our own custom editor. As for Orbeon's performance, once we worked out some quirks, we've been quite happy with Orbeon. Orbeon hosts a useful performance and tuning page [2]. We also learned that it is helpful to stop the Orbeon app and restart it about once every two weeks as performance can become progressively slower. It seems to need a little reboot. In any event, a typical XForm for us is about 200k, with a number of authority lists, one of which includes nearly 1500 items. Orbeon loads and renders the XForm fairly quickly (less than 4 seconds) and editing performance hasn't been an issue either, which is great considering that a 1500-item-subject-authority drop down list is created for each subject being added to a record. Moving such a large XForm to a server-based solution was necessary. Our XForm cataloging application, which began with a simple DC record and focused on producing a viable XForm, initially used the Mozilla XForm add-on [3]. The Firefox add-on, which of course runs on the client, easily scaled for a VRA Core4 record, but it couldn't handle a burgeoning subject authority file. Hence the need for an alternative solution, quick. -Kevin [1] http://www.orbeon.com/ops/xforms-sandbox/ [2] http://wiki.orbeon.com/forms/doc/developer-guide/performance-tuning [3] http://www.mozilla.org/projects/xforms/ -- Kevin Ford Library Digital Collections Columbia College Chicago -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Andrew Ashton Sent: Friday, November 13, 2009 8:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] XForms EAD editor sandbox available Nice job, Ethan. This looks really cool. We have an Orbeon-based MODS editor, but I have found Orbeon to be a bit tough to develop/maintain and more heavyweight than we really need. We're considering more Xforms implementations, but I would love to find a more lightweight Xforms application. Does anyone have any recommendations? The only one I know of is XSLTForms (http://www.agencexml.com/xsltforms) but I haven't messed with it yet. -Andy On 11/13/09 9:13 AM, Eric Hellman e...@hellman.net wrote: XForms and Orbeon are very interesting tools for developing metadata management tools. The ONIX developers have used this stack to produce an interface for ONIX-PL called OPLE that people should try out. http://www.jisc.ac.uk/whatwedo/programmes/pals3/onixeditor.aspx Questions about Orbeon relate to performance and integrability, but I think it's an impressive use of XForms nonetheless. - Eric On Nov 12, 2009, at 1:30 PM, Ethan Gruber wrote: Hello all, Over the past few months I have been working on and off on a research project to develop a XForms, web-based editor for EAD finding aids that runs within the Orbeon tomcat application. While still in a very early alpha stage (I have probably put only 60-80 hours of work into it thus far), I think that it's ready for a general demonstration to solicit opinions, criticism, etc. from librarians, and technical staff. Background: For those not familiar with XForms, it is a W3C standard for creating next-generation forms. It is powerful and can allow you to create XML in the way that it is intended to be created, without limits to repeatability, complex hierarchies, or mixed content. Orbeon adds a level on top of that, taking care of all the ajax calls, serialization, CRUD operations, and a variety of widgets that allow nice features like tabs and autocomplete/autosuggest that can be bound to authority lists and controlled access terms. By default, Orbeon reads and writes data from and to an eXist database that comes packaged with it, but you can have it serialize the XML to disk or have it interact with any REST interface such as Fedora. Goals: Ultimately, I wish to create a system of forms that can open any EAD 2002-compliant XML file without any data loss or XML transformation whatsoever. I think that this is the shortcoming of systems such as Archon and Archivists' Toolkit. I want to integrate authority lists that can be integrated into certain fields with autosuggest (such as corporate names, people, and subjects). If there is demand, I can build a public interface for