Re: [CODE4LIB] best way to make MARC files available to anyone
On 13 Jun 2013, at 02:57, Dana Pearson dbpearsonm...@gmail.com wrote: quick followup on the thread.. github: I looked at the cooperhewitt collection but don't see a way to download the content...I could copy and paste their content but that may not be the best approach for my files...documentation is thin, seems i would have to provide email addresses for those seeking access...but clearly that is not the case with how the cooperhewitt archive is configured.. My primary concern has been to make it as simple a process as possible for libraries which have limited technical expertise. I suspect from what you say that GitHub is not what you want in this case. However, I just wanted to clarify that you can download files as a Zip file (e.g. for Cooper Hewitt https://github.com/cooperhewitt/collection/archive/master.zip), and that this link is towards the top left on each screen in GitHub. The repository is a public one (which is the default, and only option unless you have a paid account on GitHub) and you do not need to provide email addresses or anything else to access the files on a public repository Owen
Re: [CODE4LIB] best way to make MARC files available to anyone
Thanks Owen, I conflated github and dropbox in my earlier summary and left out any reference to dropbox...they do the email requirement...sorry...it was late and a hurried summary...will look again for that download option on github thanks again, dana On Thu, Jun 13, 2013 at 9:09 AM, Owen Stephens o...@ostephens.com wrote: On 13 Jun 2013, at 02:57, Dana Pearson dbpearsonm...@gmail.com wrote: quick followup on the thread.. github: I looked at the cooperhewitt collection but don't see a way to download the content...I could copy and paste their content but that may not be the best approach for my files...documentation is thin, seems i would have to provide email addresses for those seeking access...but clearly that is not the case with how the cooperhewitt archive is configured.. My primary concern has been to make it as simple a process as possible for libraries which have limited technical expertise. I suspect from what you say that GitHub is not what you want in this case. However, I just wanted to clarify that you can download files as a Zip file (e.g. for Cooper Hewitt https://github.com/cooperhewitt/collection/archive/master.zip), and that this link is towards the top left on each screen in GitHub. The repository is a public one (which is the default, and only option unless you have a paid account on GitHub) and you do not need to provide email addresses or anything else to access the files on a public repository Owen -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
On Jun 12, 2013, at 10:24 AM, Daniel Lovins daniel.lov...@nyu.edu wrote: If anyone from HathiTrust is watching this thread, I'd also be curious if they're considering bulk record downloads via something other than OAI [1]. [1] http://www.lib.umich.edu/michigan-digitization-project-oai-harvesting While the process may not be exactly what you are looking for, it is possible to use the HathiTrust Research Center's services to do bulk downloads (of MARC and data records). [2] In a nutshell process is to: 1. create an account 2. create a work set 3. fill the set with HathiTrust items 4. use the Marc_Downloader algorithm to obtain metadata 5. use their Data API to obtain full text [3] I blogged, very briefly, on this subject. [4] [2] https://htrc2.pti.indiana.edu/HTRC-UI-Portal2/ [3] http://wiki.htrc.illinois.edu/display/COM/HTRC+Data+API+Users+Guide [4] http://dh.crc.nd.edu/blog/2013/05/htrc/ -- Eric Lease Morgan University of Notre Dame
Re: [CODE4LIB] best way to make MARC files available to anyone
Thanks very much, Eric. I'll definitely take a look at your blog post. - Daniel Daniel Lovins Head of Knowledge Access, Design Development Knowledge Access Resource Management Services New York University, Division of Libraries 20 Cooper Square, 3rd floor New York, NY 10003-7112 daniel.lov...@nyu.edu 212-998-2489 On Jun 13, 2013, at 11:25 AM, Eric Lease Morgan emor...@nd.edu wrote: On Jun 12, 2013, at 10:24 AM, Daniel Lovins daniel.lov...@nyu.edu wrote: If anyone from HathiTrust is watching this thread, I'd also be curious if they're considering bulk record downloads via something other than OAI [1]. [1] http://www.lib.umich.edu/michigan-digitization-project-oai-harvesting While the process may not be exactly what you are looking for, it is possible to use the HathiTrust Research Center's services to do bulk downloads (of MARC and data records). [2] In a nutshell process is to: 1. create an account 2. create a work set 3. fill the set with HathiTrust items 4. use the Marc_Downloader algorithm to obtain metadata 5. use their Data API to obtain full text [3] I blogged, very briefly, on this subject. [4] [2] https://htrc2.pti.indiana.edu/HTRC-UI-Portal2/ [3] http://wiki.htrc.illinois.edu/display/COM/HTRC+Data+API+Users+Guide [4] http://dh.crc.nd.edu/blog/2013/05/htrc/ -- Eric Lease Morgan University of Notre Dame
Re: [CODE4LIB] best way to make MARC files available to anyone
Dear Dana, Thanks for the detail. Based on the few example comparisons I've seen, I very much like your MARC records more. Not only are they richer, they break up the data better. Yours, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Wednesday, June 12, 2013 7:20 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] best way to make MARC files available to anyone Kevin, Eric 7zip worked fine to unzip and records look pretty good since they used 653 and preserved the string from the metadata element with the hypens. However the records do not do subfield d in 100 or 700 fields and thus such content appears in the 245$c. 245$a seems to go missing with some frequency. MarcEdit does not report any errors though. My original intent was just to keep my XSLT skills sharp while I had some free time last August. After creating the stylesheet, I then had no free time until January when I could devote 2 or 3 hours to the post transform editing. Thought I'd just dive in but the pool was much deeper than I had anticipated. Do think libraries will prefer my edited versions although different in non-access points as well. Incidentally, not many additions since my harvest. First record in the Project Gutenberg produced records: =LDR 00721cam a22002293a 4500 =001 27384 =003 PGUSA =008 081202s2008xxu|s|000\|\eng\d =040 \\$aPGUSA$beng =042 \\$adc =050 \4$aPQ =100 1\$aDumas, Alexandre, 1802-1870 =245 10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas =260 \\$bProject Gutenberg,$c2008 =500 \\$aProject Gutenberg =506 \\$aFreely available. =516 \\$aElectronic text =653 \0$aFrance -- History -- Regency, 1715-1723 -- Fiction =653 \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction =830 \0$aProject Gutenberg$v27384 =856 40$uhttp://www.gutenberg.org/etext/27384 =856 42$uhttp://www.gutenberg.org/license$3Rights couldn't readily find the above item but here's an example of my records by the same author. =LDR 01002nam a22002535 4500 =001 PG18997 =006 md =007 cr||n\|||muaua =008 \\s2006utu|o|||eng\d =042 \\$adc =090 \\$aPQ =092 \0$aeBooks =100 1\$aDumas, Alexandre,$d1802-1870. =245 14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten Years Later being the completion of The Three Musketeers And Twenty Years After /$Alexandre Dumas. =260 \\$aSalt Lake City :$bProject Gutenberg Literary Archive Foundation,$c2006. =300 \\$a1 online resource :$bmultiple file formats. =500 \\$aRecords generated from Project Gutenberg RDF data. =540 \\$aApplicable license:$uhttp://www.gutenberg.org/license =650 \0$aAdventure stories. =650 \0$aHistorical fiction. =651 \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction. =655 \0$aElectronic books. =710 2\$aProject Gutenberg. =856 40$uhttp://www.gutenberg.org/etext/18997$zClick to access. thanks for your interest.. regards, dana On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote: Hi Dana, Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC files? See, e.g.: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_ .28automatically_generated.29 Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 11, 2013 9:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] best way to make MARC files available to anyone I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
thanks, Kevin...did notice that one of the records I showed lacked the c after the $ in the 245...very odd since the stylesheet constructs that subfield and I would have had no reason to touch that particular one...phantom bytes? dana On Thu, Jun 13, 2013 at 2:20 PM, Ford, Kevin k...@loc.gov wrote: Dear Dana, Thanks for the detail. Based on the few example comparisons I've seen, I very much like your MARC records more. Not only are they richer, they break up the data better. Yours, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Wednesday, June 12, 2013 7:20 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] best way to make MARC files available to anyone Kevin, Eric 7zip worked fine to unzip and records look pretty good since they used 653 and preserved the string from the metadata element with the hypens. However the records do not do subfield d in 100 or 700 fields and thus such content appears in the 245$c. 245$a seems to go missing with some frequency. MarcEdit does not report any errors though. My original intent was just to keep my XSLT skills sharp while I had some free time last August. After creating the stylesheet, I then had no free time until January when I could devote 2 or 3 hours to the post transform editing. Thought I'd just dive in but the pool was much deeper than I had anticipated. Do think libraries will prefer my edited versions although different in non-access points as well. Incidentally, not many additions since my harvest. First record in the Project Gutenberg produced records: =LDR 00721cam a22002293a 4500 =001 27384 =003 PGUSA =008 081202s2008xxu|s|000\|\eng\d =040 \\$aPGUSA$beng =042 \\$adc =050 \4$aPQ =100 1\$aDumas, Alexandre, 1802-1870 =245 10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas =260 \\$bProject Gutenberg,$c2008 =500 \\$aProject Gutenberg =506 \\$aFreely available. =516 \\$aElectronic text =653 \0$aFrance -- History -- Regency, 1715-1723 -- Fiction =653 \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction =830 \0$aProject Gutenberg$v27384 =856 40$uhttp://www.gutenberg.org/etext/27384 =856 42$uhttp://www.gutenberg.org/license$3Rights couldn't readily find the above item but here's an example of my records by the same author. =LDR 01002nam a22002535 4500 =001 PG18997 =006 md =007 cr||n\|||muaua =008 \\s2006utu|o|||eng\d =042 \\$adc =090 \\$aPQ =092 \0$aeBooks =100 1\$aDumas, Alexandre,$d1802-1870. =245 14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten Years Later being the completion of The Three Musketeers And Twenty Years After /$Alexandre Dumas. =260 \\$aSalt Lake City :$bProject Gutenberg Literary Archive Foundation,$c2006. =300 \\$a1 online resource :$bmultiple file formats. =500 \\$aRecords generated from Project Gutenberg RDF data. =540 \\$aApplicable license:$uhttp://www.gutenberg.org/license =650 \0$aAdventure stories. =650 \0$aHistorical fiction. =651 \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction. =655 \0$aElectronic books. =710 2\$aProject Gutenberg. =856 40$uhttp://www.gutenberg.org/etext/18997$zClick to access. thanks for your interest.. regards, dana On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote: Hi Dana, Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC files? See, e.g.: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_ .28automatically_generated.29 Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 11, 2013 9:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] best way to make MARC files available to anyone I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile
Re: [CODE4LIB] best way to make MARC files available to anyone
Putting the files on GitHub might be an option - free for public repositories, and 38Mb should not be a problem to host there Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 12 Jun 2013, at 02:24, Dana Pearson dbpearsonm...@gmail.com wrote: I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
Or the Internet Archive, since there are also a whole bunch of other MARC dumps there. -Ross. On Jun 12, 2013, at 4:25 AM, Owen Stephens o...@ostephens.com wrote: Putting the files on GitHub might be an option - free for public repositories, and 38Mb should not be a problem to host there Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 12 Jun 2013, at 02:24, Dana Pearson dbpearsonm...@gmail.com wrote: I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
I would put them on Dropbox or S3. The Dropbox free account is 5 GB. Cary On Wed, Jun 12, 2013 at 4:09 AM, Ross Singer rossfsin...@gmail.com wrote: Or the Internet Archive, since there are also a whole bunch of other MARC dumps there. -Ross. On Jun 12, 2013, at 4:25 AM, Owen Stephens o...@ostephens.com wrote: Putting the files on GitHub might be an option - free for public repositories, and 38Mb should not be a problem to host there Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 12 Jun 2013, at 02:24, Dana Pearson dbpearsonm...@gmail.com wrote: I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] best way to make MARC files available to anyone
Thanks for the replies..I had looked at GitHub but thought it something different, ie, collaborative software development...I will look again hadn't thought of the Internet archive but that might be good and I'll take a look at dropbox and Eric's other suggestions...altogether new to the 'cloud' and regarding MARC records on the Gutenberg Project page...there is a new feature that converts RDF/DC to MARC but the download was small so I suspect only recent additions...in fact, the necessary editing would remain but may be useful for keeping my work up to date...I'll be interested to see how it handles new line feeds in dc:title elements. thanks again for the suggestions including Cary's that comes in as I type this dana On Wed, Jun 12, 2013 at 6:09 AM, Ross Singer rossfsin...@gmail.com wrote: Or the Internet Archive, since there are also a whole bunch of other MARC dumps there. -Ross. On Jun 12, 2013, at 4:25 AM, Owen Stephens o...@ostephens.com wrote: Putting the files on GitHub might be an option - free for public repositories, and 38Mb should not be a problem to host there Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 12 Jun 2013, at 02:24, Dana Pearson dbpearsonm...@gmail.com wrote: I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
On 12 Jun 2013, at 14:06, Dana Pearson dbpearsonm...@gmail.com wrote: Thanks for the replies..I had looked at GitHub but thought it something different, ie, collaborative software development...I will look again Yes - that's the main use (git is version control software, GitHub hosts git repositories) - but of course git doesn't care what types of files you have under version control. It came to mind because I know it's been used to distribute metadata files before - e.g. this set of metadata from the Cooper Hewitt National Design Museum https://github.com/cooperhewitt/collection There could be some additional benefits gained through using git to version control this type of file, and GitHub to distribute them if you were interested, but it can act as simply a place to put the files and make them available for download. But of course the other suggestions would do this simpler task just as well. Owen
Re: [CODE4LIB] best way to make MARC files available to anyone
Hi Dana, Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC files? See, e.g.: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29 Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 11, 2013 9:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] best way to make MARC files available to anyone I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
Doh! I read all the emails in the thread except for Eric's, which asked the same question. Either way, his or mine, nevertheless curious. Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Phetteplace Sent: Tuesday, June 11, 2013 10:57 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] best way to make MARC files available to anyone Dana - perhaps a public Dropbox folder? Or just put the files up on your site somewhere, served with a Content-Disposition: attachment header so they trigger a download when accessed? E.g. here's a StackOverflowhttp://stackoverflow.com/questions/9195304/how-to-use- content-disposition-for-force-a-file-to-download-to-the-hard- drivethread on that. If they must be a recognized MIME type, you could compress them as .zip or .tar.gz files on the server, which would reduce download time either way. I did try clicking the links on your site and they never downloaded, the request just timed out. Not to discredit what you're doing, which is great, but aren't MARC records already available for Project Gutenberg? See their offline catalogshttp://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_ Records_.28offsite.29page. Best, Eric Phetteplace Emerging Technologies Librarian Chesapeake College Wye Mills, MD On Tue, Jun 11, 2013 at 9:24 PM, Dana Pearson dbpearsonm...@gmail.comwrote: I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non- Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
If anyone from HathiTrust is watching this thread, I'd also be curious if they're considering bulk record downloads via something other than OAI [1]. Thanks. Daniel [1] http://www.lib.umich.edu/michigan-digitization-project-oai-harvesting -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ford, Kevin Sent: Wednesday, June 12, 2013 10:12 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] best way to make MARC files available to anyone Doh! I read all the emails in the thread except for Eric's, which asked the same question. Either way, his or mine, nevertheless curious. Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Phetteplace Sent: Tuesday, June 11, 2013 10:57 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] best way to make MARC files available to anyone Dana - perhaps a public Dropbox folder? Or just put the files up on your site somewhere, served with a Content-Disposition: attachment header so they trigger a download when accessed? E.g. here's a StackOverflowhttp://stackoverflow.com/questions/9195304/how-to-use- content-disposition-for-force-a-file-to-download-to-the-hard- drivethread on that. If they must be a recognized MIME type, you could compress them as .zip or .tar.gz files on the server, which would reduce download time either way. I did try clicking the links on your site and they never downloaded, the request just timed out. Not to discredit what you're doing, which is great, but aren't MARC records already available for Project Gutenberg? See their offline catalogshttp://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC _ Records_.28offsite.29page. Best, Eric Phetteplace Emerging Technologies Librarian Chesapeake College Wye Mills, MD On Tue, Jun 11, 2013 at 9:24 PM, Dana Pearson dbpearsonm...@gmail.comwrote: I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non- Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
Kevin, don't know yet since don't know how to unzip the file...bz2?...in any case, I'm guessing that there is no post transformation editing that most libraries would insist upon...eg, subject headings in the metadata are strings with hyphens separating subjects from subheadings and spatial, temporal, genre subfields have to be introduced...some content needs to go into 600,610, 611,630,651 fields...for more on the post transform editing see: http://dbpearsonmlis.com/GPmetadata.html dana On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote: Hi Dana, Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC files? See, e.g.: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29 Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 11, 2013 9:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] best way to make MARC files available to anyone I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
Kevin, Eric 7zip worked fine to unzip and records look pretty good since they used 653 and preserved the string from the metadata element with the hypens. However the records do not do subfield d in 100 or 700 fields and thus such content appears in the 245$c. 245$a seems to go missing with some frequency. MarcEdit does not report any errors though. My original intent was just to keep my XSLT skills sharp while I had some free time last August. After creating the stylesheet, I then had no free time until January when I could devote 2 or 3 hours to the post transform editing. Thought I'd just dive in but the pool was much deeper than I had anticipated. Do think libraries will prefer my edited versions although different in non-access points as well. Incidentally, not many additions since my harvest. First record in the Project Gutenberg produced records: =LDR 00721cam a22002293a 4500 =001 27384 =003 PGUSA =008 081202s2008xxu|s|000\|\eng\d =040 \\$aPGUSA$beng =042 \\$adc =050 \4$aPQ =100 1\$aDumas, Alexandre, 1802-1870 =245 10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas =260 \\$bProject Gutenberg,$c2008 =500 \\$aProject Gutenberg =506 \\$aFreely available. =516 \\$aElectronic text =653 \0$aFrance -- History -- Regency, 1715-1723 -- Fiction =653 \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction =830 \0$aProject Gutenberg$v27384 =856 40$uhttp://www.gutenberg.org/etext/27384 =856 42$uhttp://www.gutenberg.org/license$3Rights couldn't readily find the above item but here's an example of my records by the same author. =LDR 01002nam a22002535 4500 =001 PG18997 =006 md =007 cr||n\|||muaua =008 \\s2006utu|o|||eng\d =042 \\$adc =090 \\$aPQ =092 \0$aeBooks =100 1\$aDumas, Alexandre,$d1802-1870. =245 14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten Years Later being the completion of The Three Musketeers And Twenty Years After /$Alexandre Dumas. =260 \\$aSalt Lake City :$bProject Gutenberg Literary Archive Foundation,$c2006. =300 \\$a1 online resource :$bmultiple file formats. =500 \\$aRecords generated from Project Gutenberg RDF data. =540 \\$aApplicable license:$uhttp://www.gutenberg.org/license =650 \0$aAdventure stories. =650 \0$aHistorical fiction. =651 \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction. =655 \0$aElectronic books. =710 2\$aProject Gutenberg. =856 40$uhttp://www.gutenberg.org/etext/18997$zClick to access. thanks for your interest.. regards, dana On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote: Hi Dana, Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC files? See, e.g.: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29 Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana Pearson Sent: Tuesday, June 11, 2013 9:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] best way to make MARC files available to anyone I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
quick followup on the thread.. github: I looked at the cooperhewitt collection but don't see a way to download the content...I could copy and paste their content but that may not be the best approach for my files...documentation is thin, seems i would have to provide email addresses for those seeking access...but clearly that is not the case with how the cooperhewitt archive is configured.. My primary concern has been to make it as simple a process as possible for libraries which have limited technical expertise. One of the reasons I made a career change was my inability as a library director to integrate very useful online resources in the library's content discovery system. Each of the libraries I led lacked expertise and/or the technical support necessary to do so. So, quit my job, re-tooled and now working independently. Internet Archive: I did a search that included a query term MARC and found the Open Library and this may be the best option but I will have to include a field in each record I think...something I could easilydo...the marc records do download nicely...I'll send a message for guidance on this Eric's suggestion regarding MIME type is interesting as well but seems I would have to have a recognizable type like zip...would prefer to have the files no larger than 4000 or so records to facilitate processing...there are also some content libraries may not want...eg, erotic literature, juvenile content.. found the file for comparison with GP generated MARC: =LDR 00945nam a22002535 4500 =001 PG27384 =006 md =007 cr||n\|||muaua =008 \\s2008utu|o|||eng\d =042 \\$adc =090 \\$aPQ =092 \0$aeBooks =100 1\$aDumas, Alexandre,$d1802-1870. =240 14$aUne fille du régent.$lEnglish =245 14$aThe Regent's Daughter$h[electronic resource] /$cAlexandre Dumas. =260 \\$aSalt Lake City :$bProject Gutenberg Literary Archive Foundation,$c2008. =300 \\$a1 online resource :$bmultiple file formats. =500 \\$aRecords generated from Project Gutenberg RDF data. =540 \\$aApplicable license:$uhttp://www.gutenberg.org/license =600 10$aOrléans, Philippe,$cduc d',$d1674-1723$vFiction. =651 \0$aFrance$xHistory$yRegency, 1715-1723$vFiction. =655 \0$aElectronic books. =710 2\$aProject Gutenberg. =856 40$uhttp://www.gutenberg.org/etext/27384$zClick to access. Gutenberg Project MARC: =LDR 00721cam a22002293a 4500 =001 27384 =003 PGUSA =008 081202s2008xxu|s|000\|\eng\d =040 \\$aPGUSA$beng =042 \\$adc =050 \4$aPQ =100 1\$aDumas, Alexandre, 1802-1870 =245 10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas =260 \\$bProject Gutenberg,$c2008 =500 \\$aProject Gutenberg =506 \\$aFreely available. =516 \\$aElectronic text =653 \0$aFrance -- History -- Regency, 1715-1723 -- Fiction =653 \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction =830 \0$aProject Gutenberg$v27384 =856 40$uhttp://www.gutenberg.org/etext/27384 =856 42$uhttp://www.gutenberg.org/license$3Rights thanks again, dana On Wed, Jun 12, 2013 at 6:19 PM, Dana Pearson dbpearsonm...@gmail.comwrote: Kevin, Eric 7zip worked fine to unzip and records look pretty good since they used 653 and preserved the string from the metadata element with the hypens. However the records do not do subfield d in 100 or 700 fields and thus such content appears in the 245$c. 245$a seems to go missing with some frequency. MarcEdit does not report any errors though. My original intent was just to keep my XSLT skills sharp while I had some free time last August. After creating the stylesheet, I then had no free time until January when I could devote 2 or 3 hours to the post transform editing. Thought I'd just dive in but the pool was much deeper than I had anticipated. Do think libraries will prefer my edited versions although different in non-access points as well. Incidentally, not many additions since my harvest. First record in the Project Gutenberg produced records: =LDR 00721cam a22002293a 4500 =001 27384 =003 PGUSA =008 081202s2008xxu|s|000\|\eng\d =040 \\$aPGUSA$beng =042 \\$adc =050 \4$aPQ =100 1\$aDumas, Alexandre, 1802-1870 =245 10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas =260 \\$bProject Gutenberg,$c2008 =500 \\$aProject Gutenberg =506 \\$aFreely available. =516 \\$aElectronic text =653 \0$aFrance -- History -- Regency, 1715-1723 -- Fiction =653 \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction =830 \0$aProject Gutenberg$v27384 =856 40$uhttp://www.gutenberg.org/etext/27384 =856 42$uhttp://www.gutenberg.org/license$3Rights couldn't readily find the above item but here's an example of my records by the same author. =LDR 01002nam a22002535 4500 =001 PG18997 =006 md =007 cr||n\|||muaua =008 \\s2006utu|o|||eng\d =042 \\$adc =090 \\$aPQ =092 \0$aeBooks =100 1\$aDumas, Alexandre,$d1802-1870. =245 14$aThe Vicomte de Bragelonne$h[electronic
Re: [CODE4LIB] best way to make MARC files available to anyone
Dana - perhaps a public Dropbox folder? Or just put the files up on your site somewhere, served with a Content-Disposition: attachment header so they trigger a download when accessed? E.g. here's a StackOverflowhttp://stackoverflow.com/questions/9195304/how-to-use-content-disposition-for-force-a-file-to-download-to-the-hard-drivethread on that. If they must be a recognized MIME type, you could compress them as .zip or .tar.gz files on the server, which would reduce download time either way. I did try clicking the links on your site and they never downloaded, the request just timed out. Not to discredit what you're doing, which is great, but aren't MARC records already available for Project Gutenberg? See their offline catalogshttp://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28offsite.29page. Best, Eric Phetteplace Emerging Technologies Librarian Chesapeake College Wye Mills, MD On Tue, Jun 11, 2013 at 9:24 PM, Dana Pearson dbpearsonm...@gmail.comwrote: I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would like to make these files available to any library that is interested. I thought that I would put them on my website via FTP but don't know if that is the best way. Don't have an ftp client myself so was thinking that that may be now passé. I tried using Google Drive with access available via the link to two versions of the files, UTF8 and MARC8. However, it seems that that is not a viable solution. I can access the files with the URLs provided by setting the access to anyone with the URL but doesn't work for some of those testing it for me or with the links I have on my webpage.. I have five folders with files of about 38 MB total. I have separated the ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts such as Chinese, Modern Greek. Most of the content is in the ebook folder. I would like to make access as easy as possible. Google Drive seems to work for me. Here's the link to my page with the links in case you would like to look at the folders. Works for me but not for everyone who's tried it. http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html thanks, dana -- Dana Pearson dbpearsonmlis.com