Re: [CODE4LIB] library of congress call number subject coding
yes, that works, thanks Bilal...very impressive regards, dana On Wed, Sep 3, 2014 at 11:06 AM, Bilal Khalid bilal.kha...@utoronto.ca wrote: Apologies! Here's a link that should be more durable: http://www.library.utoronto.ca/bilal/lc_dimension.xml Regards, -Bilal -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, September 02, 2014 8:53 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] library of congress call number subject coding Hi Bilal, sounds very interesting but the link does not connect to anything don't have an immediate need but i work with XSL, MARCXML and would be fun to experiment regards, dana On Tue, Sep 2, 2014 at 4:24 PM, Bilal Khalid bilal.kha...@utoronto.ca wrote: Hi Ken, Here's a link to an XML mapping of LC call numbers ranges to categories that we use in an indexing software. It may be a bit hefty for your needs (almost 6000 mappings), but hope it helps! http://bilalk.library.utoronto.ca/lc_dimension.xml Cheers, -Bilal -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ken Irwin Sent: Tuesday, September 02, 2014 4:42 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] library of congress call number subject coding Hi folks, Does anyone have a handy scheme for coding LC call numbers into just a few broad subject areas (e.g. Arts, Humanities, Sciences, Social Sciences) or perhaps something only a little more granular than that? I'm hoping for a list that will turn 1-3 letter LC classes into subject groups, and I'd rather not reinvent the wheel if someone's already got something. Any leads? Thanks Ken -- Dana Pearson dbpearsonmlis.com Metadata and Bibliographic Services for Libraries -- Dana Pearson dbpearsonmlis.com Metadata and Bibliographic Services for Libraries
Re: [CODE4LIB] library of congress call number subject coding
Hi Bilal, sounds very interesting but the link does not connect to anything don't have an immediate need but i work with XSL, MARCXML and would be fun to experiment regards, dana On Tue, Sep 2, 2014 at 4:24 PM, Bilal Khalid bilal.kha...@utoronto.ca wrote: Hi Ken, Here's a link to an XML mapping of LC call numbers ranges to categories that we use in an indexing software. It may be a bit hefty for your needs (almost 6000 mappings), but hope it helps! http://bilalk.library.utoronto.ca/lc_dimension.xml Cheers, -Bilal -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ken Irwin Sent: Tuesday, September 02, 2014 4:42 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] library of congress call number subject coding Hi folks, Does anyone have a handy scheme for coding LC call numbers into just a few broad subject areas (e.g. Arts, Humanities, Sciences, Social Sciences) or perhaps something only a little more granular than that? I'm hoping for a list that will turn 1-3 letter LC classes into subject groups, and I'd rather not reinvent the wheel if someone's already got something. Any leads? Thanks Ken -- Dana Pearson dbpearsonmlis.com Metadata and Bibliographic Services for Libraries
Re: [CODE4LIB] metadata for free ebook repositories
Hi Stuart, I've done RDF/DC to MARC for the Gutenberg Project. Requires a lot of clean up especially with respect to subject heading strings since LCSH might well appear in DC element but need to be parsed into marc subfields. Tedious, human intervention required in the case of the Gutenberg Project. Close to finishing the editing of about 4000 records harvested in late December, 2014; about 16 months after an initial harvest of about 40,000. The RDF/DC had changed somewhat but significantly fewer subject headings it seemed. I decided to examine virtually every item and to find better records at the Library of Congress or more frequently the Internet Archive [ archive.org/details/texts ] Fully agree how important it is but don't think I'll do it again since consumes all my free time. Maybe if others could volunteer to do that, I could continue harvesting. Only download of the complete collection is possible but I use XSL to select records based on date added. The collections you mention are worthy of being included in library systems. Metadata quality is a limiting factor. regards, dana On Mon, Aug 18, 2014 at 5:04 PM, Stuart Yeates stuart.yea...@vuw.ac.nz wrote: There are a stack of great free ebook repositories available on the web, things like https://unglue.it/ http://www.gutenberg.org/ https://en.wikibooks.org/wiki/Main_Page http://www.gutenberg.net.au/ https://www.smashwords.com/books/category/1/newest/0/free/any etc, etc What there doesn't appear to be, is high-quality AACR2 / RDA records available for these. There are things like https://ebooks.adelaide.edu. au/meta/pg/ which are elaborate dublin core to MARC converters, but these lack standardisation of names, authority control (people, entities, places, etc), interlinking, etc. It seems to me that quality metadata would greatly increase the value / findability / use of these projects and thus their visibility and available sources. Are there any projects working in this space already? Are there suitable tools available? cheers stuart -- Dana Pearson dbpearsonmlis.com Metadata and Bibliographic Services for Libraries
Re: [CODE4LIB] metadata for free ebook repositories
Karen, It seems to me that the Open Library would want to broaden use of this great collection as much as possible. Yet, MARC records for the 1/3 or so items in the collection cannot be downloaded so that they could be imported into local library systems. Lots of users searching local libraries who might well use google and Open Library, Internet Archive for finding ebooks less frequently. I'll look at Tom Morris's code to see if I might automate record selection of Open Library records compared with element of MARCXML records of this last group of Guterberg Project additions. Thanks for that information. regards, dana On Mon, Aug 18, 2014 at 6:57 PM, Karen Coyle li...@kcoyle.net wrote: About 1/3 of the 1M ebooks on OpenLibrary.org have full MARC records, and you can retrieve the record via the API. There is also a secret record format that returns not the full MARC for the hard copy (which is what the records represent because these are digitized books) but a record that has been modified to represent the ebook. The MARC records for the hard copy follow the pattern: https://archive.org/download/[archive identifier]/[archive identifier]_marc.[xml|mrc] Download MARC XML https://archive.org/download/myantonia00cathrich/ myantonia00cathrich_marc.xml Download MARC binary https://www.archive.org/download/myantonia00cathrich/ myantonia00cathrich_meta.mrc https://archive.org/download/ myantonia00cathrich/myantonia00cathrich_meta.mrc To get the one that represents the ebook, do: https://archive.org/download/[archive identifier]/[archive identifier]_archive_marc.xml https://archive.org/download/myantonia00cathrich/ myantonia00cathrich_archive_marc.xml This one has an 007, the 245 $h, and a few other things. Tom Morris did some code that helps you search for books by author and title and retrieve a MARC record. I don't recall where his github archive is, but I'll find out and post it here. The code is open source. We used it for a project that added ebook records to a public library catalog. You can also use the OPenLibrary API to select all open access ebooks. What I'd like to see is a way to create a list or bibliography in OL that then is imported into a program that will find MARC records for those books. The list function is still under development, though. kc On 8/18/14, 3:04 PM, Stuart Yeates wrote: There are a stack of great free ebook repositories available on the web, things like https://unglue.it/ http://www.gutenberg.org/ https://en.wikibooks.org/wiki/Main_Page http://www.gutenberg.net.au/ https://www.smashwords.com/books/category/1/newest/0/free/any etc, etc What there doesn't appear to be, is high-quality AACR2 / RDA records available for these. There are things like https://ebooks.adelaide.edu. au/meta/pg/ which are elaborate dublin core to MARC converters, but these lack standardisation of names, authority control (people, entities, places, etc), interlinking, etc. It seems to me that quality metadata would greatly increase the value / findability / use of these projects and thus their visibility and available sources. Are there any projects working in this space already? Are there suitable tools available? cheers stuart -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: +1-510-435-8234 skype: kcoylenet/+1-510-984-3600 -- Dana Pearson dbpearsonmlis.com Metadata and Bibliographic Services for Libraries
Re: [CODE4LIB] Excel to XML
I don't use Excel but a client did who wanted to use XSL I had created ONIX to MARC to transformbibliographic metadata in Excel to XML. The built in Excel XML converter was not very helpful since empty cells were skipped so that it was impossible to use that result. There is an add on that allow you to map your data to XML elements by creating a schema which is pretty cool. http://bit.ly/1jpwtqM This might be helpful. regards, dana On Fri, Jun 13, 2014 at 6:53 PM, Terry Brady tw...@georgetown.edu wrote: The current version of Excel offers a save as XML option. It will produce something like this. There is other wrapping metadata, but the table is pretty easy to parse. Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7 x:FullColumns=1 x:FullRows=1 ss:DefaultRowHeight=15 Row Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell CellData ss:Type=Stringquestion 1/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell CellData ss:Type=Stringquestion 2/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell CellData ss:Type=Stringquest /Data/Cell CellData ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62/ /Row /Table On Fri, Jun 13, 2014 at 2:28 PM, Ryan Engel rten...@wisc.edu wrote: Hello - I have an Excel spreadsheet that, for the purposes of an easy import into a Drupal site, I'd like to convert to XML. I know people more knowledgeable than I could code up something in Python or Perl to convert a CSV version of the data to XML (and I have a colleague who offered to do just that for me), but I am looking for recommendations for something more immediately accessible. Here's an idea of how the spreadsheet is structured: Row1Question1Q1Answer1 Row2Q1Answer2 Row3Q1Answer3 Row4Question2Q2Answer1 Row5Q2Answer2 Row6Question3Q3Answer1 etc. How do other people approach this? Import the data to an SQL database, write some clever queries, and then export that to XML? Work some wizardry in GoogleRefine/OpenRefine? Are scripting languages really the best all around solution? Excel's built in XML mapping function wasn't able to process the one-to-many relationship of questions to answers, though maybe I just don't know how to build the mapping structure correctly. In the interest immanent deadlines, I have handed the spreadsheet off to my Perl-writing colleague. But as a professional growth opportunity, I'm interested in suggestions from Libraryland about ways others have approached this successfully. Thanks! Ryan Engel Web Stuff UW-Madison -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498 -- Dana Pearson dbpearsonmlis.com Metadata and Bibliographic Services for Libraries
Re: [CODE4LIB] Python and Ruby
Josh, I work exclusively with XSLT but specialize in metadata only no need for content display choices maybe a candidate for library programming language...XSLT 2.0 has useful analyze-string element to cover Roy's point by the way, Josh, live just down the road in Leeton regards, dana On Mon, Jul 29, 2013 at 12:04 PM, Roy Tennant roytenn...@gmail.com wrote: On Mon, Jul 29, 2013 at 9:57 AM, Peter Schlumpf pschlu...@earthlink.net wrote: Imagine if the library community had its own programming/scripting language, at least one that is domain relevant. What would it look like? Whatever else it had, it would have to have a sophisticated way to inspect text for patterns -- that is, regular expressions. Roy -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
Thanks Owen, I conflated github and dropbox in my earlier summary and left out any reference to dropbox...they do the email requirement...sorry...it was late and a hurried summary...will look again for that download option on github thanks again, dana On Thu, Jun 13, 2013 at 9:09 AM, Owen Stephens o...@ostephens.com wrote: On 13 Jun 2013, at 02:57, Dana Pearson dbpearsonm...@gmail.com wrote: quick followup on the thread.. github: I looked at the cooperhewitt collection but don't see a way to download the content...I could copy and paste their content but that may not be the best approach for my files...documentation is thin, seems i would have to provide email addresses for those seeking access...but clearly that is not the case with how the cooperhewitt archive is configured.. My primary concern has been to make it as simple a process as possible for libraries which have limited technical expertise. I suspect from what you say that GitHub is not what you want in this case. However, I just wanted to clarify that you can download files as a Zip file (e.g. for Cooper Hewitt https://github.com/cooperhewitt/collection/archive/master.zip), and that this link is towards the top left on each screen in GitHub. The repository is a public one (which is the default, and only option unless you have a paid account on GitHub) and you do not need to provide email addresses or anything else to access the files on a public repository Owen -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
thanks, Kevin...did notice that one of the records I showed lacked the c after the $ in the 245...very odd since the stylesheet constructs that subfield and I would have had no reason to touch that particular one...phantom bytes? dana On Thu, Jun 13, 2013 at 2:20 PM, Ford, Kevin k...@loc.gov wrote: Dear Dana, Thanks for the detail. Based on the few example comparisons I've seen, I very much like your MARC records more. Not only are they richer, they break up the data better. Yours, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Wednesday, June 12, 2013 7:20 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] best way to make MARC files available to anyone Kevin, Eric 7zip worked fine to unzip and records look pretty good since they used 653 and preserved the string from the metadata element with the hypens. However the records do not do subfield d in 100 or 700 fields and thus such content appears in the 245$c. 245$a seems to go missing with some frequency. MarcEdit does not report any errors though. My original intent was just to keep my XSLT skills sharp while I had some free time last August. After creating the stylesheet, I then had no free time until January when I could devote 2 or 3 hours to the post transform editing. Thought I'd just dive in but the pool was much deeper than I had anticipated. Do think libraries will prefer my edited versions although different in non-access points as well. Incidentally, not many additions since my harvest. First record in the Project Gutenberg produced records: =LDR 00721cam a22002293a 4500 =001 27384 =003 PGUSA =008 081202s2008xxu|s|000\|\eng\d =040 \\$aPGUSA$beng =042 \\$adc =050 \4$aPQ =100 1\$aDumas, Alexandre, 1802-1870 =245 10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas =260 \\$bProject Gutenberg,$c2008 =500 \\$aProject Gutenberg =506 \\$aFreely available. =516 \\$aElectronic text =653 \0$aFrance -- History -- Regency, 1715-1723 -- Fiction =653 \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction =830 \0$aProject Gutenberg$v27384 =856 40$uhttp://www.gutenberg.org/etext/27384 =856 42$uhttp://www.gutenberg.org/license$3Rights couldn't readily find the above item but here's an example of my records by the same author. =LDR 01002nam a22002535 4500 =001 PG18997 =006 md =007 cr||n\|||muaua =008 \\s2006utu|o|||eng\d =042 \\$adc =090 \\$aPQ =092 \0$aeBooks =100 1\$aDumas, Alexandre,$d1802-1870. =245 14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten Years Later being the completion of The Three Musketeers And Twenty Years After /$Alexandre Dumas. =260 \\$aSalt Lake City :$bProject Gutenberg Literary Archive Foundation,$c2006. =300 \\$a1 online resource :$bmultiple file formats. =500 \\$aRecords generated from Project Gutenberg RDF data. =540 \\$aApplicable license:$uhttp://www.gutenberg.org/license =650 \0$aAdventure stories. =650 \0$aHistorical fiction. =651 \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction. =655 \0$aElectronic books. =710 2\$aProject Gutenberg. =856 40$uhttp://www.gutenberg.org/etext/18997$zClick to access. thanks for your interest.. regards, dana On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote: Hi Dana, Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC files? See, e.g.: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_ .28automatically_generated.29 Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 11, 2013 9:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] best way to make MARC files available to anyone I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile
Re: [CODE4LIB] best way to make MARC files available to anyone
Thanks for the replies..I had looked at GitHub but thought it something different, ie, collaborative software development...I will look again hadn't thought of the Internet archive but that might be good and I'll take a look at dropbox and Eric's other suggestions...altogether new to the 'cloud' and regarding MARC records on the Gutenberg Project page...there is a new feature that converts RDF/DC to MARC but the download was small so I suspect only recent additions...in fact, the necessary editing would remain but may be useful for keeping my work up to date...I'll be interested to see how it handles new line feeds in dc:title elements. thanks again for the suggestions including Cary's that comes in as I type this dana On Wed, Jun 12, 2013 at 6:09 AM, Ross Singer rossfsin...@gmail.com wrote: Or the Internet Archive, since there are also a whole bunch of other MARC dumps there. -Ross. On Jun 12, 2013, at 4:25 AM, Owen Stephens o...@ostephens.com wrote: Putting the files on GitHub might be an option - free for public repositories, and 38Mb should not be a problem to host there Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 12 Jun 2013, at 02:24, Dana Pearson dbpearsonm...@gmail.com wrote: I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
Kevin, don't know yet since don't know how to unzip the file...bz2?...in any case, I'm guessing that there is no post transformation editing that most libraries would insist upon...eg, subject headings in the metadata are strings with hyphens separating subjects from subheadings and spatial, temporal, genre subfields have to be introduced...some content needs to go into 600,610, 611,630,651 fields...for more on the post transform editing see: http://dbpearsonmlis.com/GPmetadata.html dana On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote: Hi Dana, Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC files? See, e.g.: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29 Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 11, 2013 9:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] best way to make MARC files available to anyone I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
Kevin, Eric 7zip worked fine to unzip and records look pretty good since they used 653 and preserved the string from the metadata element with the hypens. However the records do not do subfield d in 100 or 700 fields and thus such content appears in the 245$c. 245$a seems to go missing with some frequency. MarcEdit does not report any errors though. My original intent was just to keep my XSLT skills sharp while I had some free time last August. After creating the stylesheet, I then had no free time until January when I could devote 2 or 3 hours to the post transform editing. Thought I'd just dive in but the pool was much deeper than I had anticipated. Do think libraries will prefer my edited versions although different in non-access points as well. Incidentally, not many additions since my harvest. First record in the Project Gutenberg produced records: =LDR 00721cam a22002293a 4500 =001 27384 =003 PGUSA =008 081202s2008xxu|s|000\|\eng\d =040 \\$aPGUSA$beng =042 \\$adc =050 \4$aPQ =100 1\$aDumas, Alexandre, 1802-1870 =245 10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas =260 \\$bProject Gutenberg,$c2008 =500 \\$aProject Gutenberg =506 \\$aFreely available. =516 \\$aElectronic text =653 \0$aFrance -- History -- Regency, 1715-1723 -- Fiction =653 \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction =830 \0$aProject Gutenberg$v27384 =856 40$uhttp://www.gutenberg.org/etext/27384 =856 42$uhttp://www.gutenberg.org/license$3Rights couldn't readily find the above item but here's an example of my records by the same author. =LDR 01002nam a22002535 4500 =001 PG18997 =006 md =007 cr||n\|||muaua =008 \\s2006utu|o|||eng\d =042 \\$adc =090 \\$aPQ =092 \0$aeBooks =100 1\$aDumas, Alexandre,$d1802-1870. =245 14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten Years Later being the completion of The Three Musketeers And Twenty Years After /$Alexandre Dumas. =260 \\$aSalt Lake City :$bProject Gutenberg Literary Archive Foundation,$c2006. =300 \\$a1 online resource :$bmultiple file formats. =500 \\$aRecords generated from Project Gutenberg RDF data. =540 \\$aApplicable license:$uhttp://www.gutenberg.org/license =650 \0$aAdventure stories. =650 \0$aHistorical fiction. =651 \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction. =655 \0$aElectronic books. =710 2\$aProject Gutenberg. =856 40$uhttp://www.gutenberg.org/etext/18997$zClick to access. thanks for your interest.. regards, dana On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote: Hi Dana, Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC files? See, e.g.: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29 Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 11, 2013 9:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] best way to make MARC files available to anyone I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
quick followup on the thread.. github: I looked at the cooperhewitt collection but don't see a way to download the content...I could copy and paste their content but that may not be the best approach for my files...documentation is thin, seems i would have to provide email addresses for those seeking access...but clearly that is not the case with how the cooperhewitt archive is configured.. My primary concern has been to make it as simple a process as possible for libraries which have limited technical expertise. One of the reasons I made a career change was my inability as a library director to integrate very useful online resources in the library's content discovery system. Each of the libraries I led lacked expertise and/or the technical support necessary to do so. So, quit my job, re-tooled and now working independently. Internet Archive: I did a search that included a query term MARC and found the Open Library and this may be the best option but I will have to include a field in each record I think...something I could easilydo...the marc records do download nicely...I'll send a message for guidance on this Eric's suggestion regarding MIME type is interesting as well but seems I would have to have a recognizable type like zip...would prefer to have the files no larger than 4000 or so records to facilitate processing...there are also some content libraries may not want...eg, erotic literature, juvenile content.. found the file for comparison with GP generated MARC: =LDR 00945nam a22002535 4500 =001 PG27384 =006 md =007 cr||n\|||muaua =008 \\s2008utu|o|||eng\d =042 \\$adc =090 \\$aPQ =092 \0$aeBooks =100 1\$aDumas, Alexandre,$d1802-1870. =240 14$aUne fille du régent.$lEnglish =245 14$aThe Regent's Daughter$h[electronic resource] /$cAlexandre Dumas. =260 \\$aSalt Lake City :$bProject Gutenberg Literary Archive Foundation,$c2008. =300 \\$a1 online resource :$bmultiple file formats. =500 \\$aRecords generated from Project Gutenberg RDF data. =540 \\$aApplicable license:$uhttp://www.gutenberg.org/license =600 10$aOrléans, Philippe,$cduc d',$d1674-1723$vFiction. =651 \0$aFrance$xHistory$yRegency, 1715-1723$vFiction. =655 \0$aElectronic books. =710 2\$aProject Gutenberg. =856 40$uhttp://www.gutenberg.org/etext/27384$zClick to access. Gutenberg Project MARC: =LDR 00721cam a22002293a 4500 =001 27384 =003 PGUSA =008 081202s2008xxu|s|000\|\eng\d =040 \\$aPGUSA$beng =042 \\$adc =050 \4$aPQ =100 1\$aDumas, Alexandre, 1802-1870 =245 10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas =260 \\$bProject Gutenberg,$c2008 =500 \\$aProject Gutenberg =506 \\$aFreely available. =516 \\$aElectronic text =653 \0$aFrance -- History -- Regency, 1715-1723 -- Fiction =653 \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction =830 \0$aProject Gutenberg$v27384 =856 40$uhttp://www.gutenberg.org/etext/27384 =856 42$uhttp://www.gutenberg.org/license$3Rights thanks again, dana On Wed, Jun 12, 2013 at 6:19 PM, Dana Pearson dbpearsonm...@gmail.comwrote: Kevin, Eric 7zip worked fine to unzip and records look pretty good since they used 653 and preserved the string from the metadata element with the hypens. However the records do not do subfield d in 100 or 700 fields and thus such content appears in the 245$c. 245$a seems to go missing with some frequency. MarcEdit does not report any errors though. My original intent was just to keep my XSLT skills sharp while I had some free time last August. After creating the stylesheet, I then had no free time until January when I could devote 2 or 3 hours to the post transform editing. Thought I'd just dive in but the pool was much deeper than I had anticipated. Do think libraries will prefer my edited versions although different in non-access points as well. Incidentally, not many additions since my harvest. First record in the Project Gutenberg produced records: =LDR 00721cam a22002293a 4500 =001 27384 =003 PGUSA =008 081202s2008xxu|s|000\|\eng\d =040 \\$aPGUSA$beng =042 \\$adc =050 \4$aPQ =100 1\$aDumas, Alexandre, 1802-1870 =245 10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas =260 \\$bProject Gutenberg,$c2008 =500 \\$aProject Gutenberg =506 \\$aFreely available. =516 \\$aElectronic text =653 \0$aFrance -- History -- Regency, 1715-1723 -- Fiction =653 \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction =830 \0$aProject Gutenberg$v27384 =856 40$uhttp://www.gutenberg.org/etext/27384 =856 42$uhttp://www.gutenberg.org/license$3Rights couldn't readily find the above item but here's an example of my records by the same author. =LDR 01002nam a22002535 4500 =001 PG18997 =006 md =007 cr||n\|||muaua =008 \\s2006utu|o|||eng\d =042 \\$adc =090 \\$aPQ =092 \0$aeBooks =100 1\$aDumas, Alexandre,$d1802-1870. =245 14$aThe Vicomte de Bragelonne$h[electronic
[CODE4LIB] best way to make MARC files available to anyone
I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] LOC Subject Headings API
Joshua, There are different formats at LOC: http://id.loc.gov/authorities/subjects.html dana On Tue, Jun 4, 2013 at 6:31 PM, Joshua Welker jwel...@sbuniv.edu wrote: I am building an auto-suggest feature into our library's search box, and I am wanting to include LOC subject headings in my suggestions list. Does anyone know of any web service that allows for automated harvesting of LOC Subject Headings? I am also looking for name authorities, for that matter. Any format will be acceptable to me: RDF, XML, JSON, HTML, CSV... I have spent a while Googling with no luck, but this seems like the sort of general-purpose thing that a lot of people would be interested in. I feel like I must be missing something. Any help is appreciated. Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624 -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] MARCXML - What is it for?
i'm not a coder but i undertook a study of XML some years after it came onto the scene and with a likely confused notion that it would be the next significant technology, I learned some XSL and later was able to weave PubMed Central journal information (CSV transformed into XML) together with Dublin Core metadata of journal articles into MARCXML during harvest with MarcEdit (which the inestimable Terry Reece continues to tweak). Also used the same XML journal data to augment NLM journal records with PubMed Central holdings and other data with a transform in my IDE though it took me weeks to get right..so, no asperations to become a coder. Probably did not get all of the MARC cataloging rules right and I can empathize with those who come to MARC and cataloging standards without cataloging training, experience. My library experience was primarily as library director...my expertise on library specializations would always be under question. regards, dana -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] newbie
I've been focusing on XSL and XQuery, but Python's on my list to do although I want to do a turn in Perl first, very versatile. Just a javascript background. regards, dana On Wed, Mar 24, 2010 at 2:24 PM, jenny jennynotanyd...@gmail.com wrote: A newly-minted library school grad who has up to this point focused my studies on Rare Books and Book Arts, I've been interested in getting back into some programming--I took two classes in college (VisualBASIC), have a smattering of web design and php, MySQL, exposure, but I'd like to try my hand at teaching myself a language in my free time. My partner is a former dotcom programmer (now studying neuroscience) and has offered to assist when needed, so I'm not completely on my own (thank goodness). My question is, where would you recommend I would begin? What's hot right now in the library world? Python, PERL, Ruby? Any advice you'd have for a beginner like me or even recommendations for online courses would be extremely appreciated JC -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] exploiting z39.50
On 5/8/09, Xiaoming Liu xiaoming@gmail.com wrote: On Fri, May 8, 2009 at 3:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote: I wonder how xID handles superceded OCLCnums, if it'll still succesfully find the right matches for you? This is documented in http://xisbn.worldcat.org/xisbnadmin/xoclcnum/api.htm#deleted Worldcat uses OCLC Control Number Cross-Reference to track deleted OCLC numbers. When an OCLC number is deleted, it's still search-able from this service. In the response, we use presentOclcnum to specify present OCLC number. For example 2416076 was merged into 24991049, a request of the deleted number 2416076 will return: rsp xmlns=http://worldcat.org/xid/xoclcnum/; stat=ok oclcnum lccn=34025476 presentOclcnum=249910492416076/oclcnum /rsp The presentOclcnum field is omitted when an OCLC number is active, so request to current OCLC number 24991049 returns: rsp xmlns=http://worldcat.org/xid/xoclcnum/; stat=ok oclcnum lccn=34025476 24991049/oclcnum /rsp Xiaoming Ray Denenberg, Library of Congress wrote: From: Eric Lease Morgan emor...@nd.edu 1. What MARC field/subfield might I put this string? 2. How would I go about getting the string indexed? 3. How might I go about querying the server for records with this string? I can at least talk about the third question. There was work on a marc attribute set, though not completed. If you look at the oid register at http://www.loc.gov/z3950/agency/defns/oids.html you'll see that the latest work on it (second draft) was in 2000, http://www.nlc-bnc.ca/iso/z3950/MARC_attribute_set_2.doc. So if someone actually wanted to put it to use it would have to be completed. For SRU there is a complete marc context set, http://www.loc.gov/standards/sru/resources/marc-context-set.html. --Ray
Re: [CODE4LIB] MARC-XML - Qualified Dublin Core XSLT
try: http://imlsdcc.grainger.uiuc.edu/docs/stylesheets/GeneralMARCtoQDC.xsl I searched the file title (not complete path) in Google. regards, Dana Pearson On Fri, Mar 6, 2009 at 2:03 PM, Walker, David dwal...@calstate.edu wrote: Hi All, Anyone have an XSLT style sheet to convert from MARC-XML to Qualified Dublin Core? I'm looking to load these into DSpace, if that makes a difference. Looks like LOC only has MARC-XML to Simple Dublin Core. This page [1] mentions a 'MARCXML to Qualified DC styles heets' developed at the University of Illinois, but the links are dead. --Dave [1] http://cicharvest.grainger.uiuc.edu/schemas.asp == David Walker Library Web Services Manager California State University http://xerxes.calstate.edu