Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-13 Thread Owen Stephens
On 13 Jun 2013, at 02:57, Dana Pearson dbpearsonm...@gmail.com wrote:

 quick followup on the thread..
 
 github:  I looked at the cooperhewitt collection but don't see a way to
 download the content...I could copy and paste their content but that may
 not be the best approach for my files...documentation is thin, seems i
 would have to provide email addresses for those seeking access...but
 clearly that is not the case with how the cooperhewitt archive is
 configured..
 
 My primary concern has been to make it as simple a process as possible for
 libraries which have limited technical expertise. 

I suspect from what you say that GitHub is not what you want in this case. 
However, I just wanted to clarify that you can download files as a Zip file 
(e.g. for Cooper Hewitt 
https://github.com/cooperhewitt/collection/archive/master.zip), and that this 
link is towards the top left on each screen in GitHub. The repository is a 
public one (which is the default, and only option unless you have a paid 
account on GitHub) and you do not need to provide email addresses or anything 
else to access the files on a public repository

Owen


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-13 Thread Dana Pearson
Thanks Owen,

I conflated github and dropbox in my earlier summary and left out any
reference to dropbox...they do the email requirement...sorry...it was late
and a hurried summary...will look again for that download option on github

thanks again,
dana


On Thu, Jun 13, 2013 at 9:09 AM, Owen Stephens o...@ostephens.com wrote:

 On 13 Jun 2013, at 02:57, Dana Pearson dbpearsonm...@gmail.com wrote:

  quick followup on the thread..
 
  github:  I looked at the cooperhewitt collection but don't see a way to
  download the content...I could copy and paste their content but that may
  not be the best approach for my files...documentation is thin, seems i
  would have to provide email addresses for those seeking access...but
  clearly that is not the case with how the cooperhewitt archive is
  configured..
 
  My primary concern has been to make it as simple a process as possible
 for
  libraries which have limited technical expertise.

 I suspect from what you say that GitHub is not what you want in this case.
 However, I just wanted to clarify that you can download files as a Zip file
 (e.g. for Cooper Hewitt
 https://github.com/cooperhewitt/collection/archive/master.zip), and that
 this link is towards the top left on each screen in GitHub. The repository
 is a public one (which is the default, and only option unless you have a
 paid account on GitHub) and you do not need to provide email addresses or
 anything else to access the files on a public repository

 Owen




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-13 Thread Eric Lease Morgan
On Jun 12, 2013, at 10:24 AM, Daniel Lovins daniel.lov...@nyu.edu wrote:

 If anyone from HathiTrust is watching this thread, I'd also be curious if
 they're considering bulk record downloads via something other than OAI [1].
 
 [1] http://www.lib.umich.edu/michigan-digitization-project-oai-harvesting


While the process may not be exactly what you are looking for, it is possible 
to use the HathiTrust Research Center's services to do bulk downloads (of MARC 
and data records). [2] In a nutshell process is to:

  1. create an account
  2. create a work set
  3. fill the set with HathiTrust items
  4. use the Marc_Downloader algorithm to obtain metadata
  5. use their Data API to obtain full text [3]

I blogged, very briefly, on this subject. [4]

[2] https://htrc2.pti.indiana.edu/HTRC-UI-Portal2/
[3] http://wiki.htrc.illinois.edu/display/COM/HTRC+Data+API+Users+Guide
[4] http://dh.crc.nd.edu/blog/2013/05/htrc/

--
Eric Lease Morgan
University of Notre Dame


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-13 Thread Daniel Lovins
Thanks very much, Eric. I'll definitely take a look at your blog post.



- Daniel





Daniel Lovins
Head of Knowledge Access, Design  Development
Knowledge Access  Resource Management Services
New York University, Division of Libraries
20 Cooper Square, 3rd floor
New York, NY 10003-7112
daniel.lov...@nyu.edu
212-998-2489



On Jun 13, 2013, at 11:25 AM, Eric Lease Morgan emor...@nd.edu wrote:



On Jun 12, 2013, at 10:24 AM, Daniel Lovins daniel.lov...@nyu.edu wrote:


If anyone from HathiTrust is watching this thread, I'd also be curious if
they're considering bulk record downloads via something other than OAI [1].

[1] http://www.lib.umich.edu/michigan-digitization-project-oai-harvesting



While the process may not be exactly what you are looking for, it is
possible to use the HathiTrust Research Center's services to do bulk
downloads (of MARC and data records). [2] In a nutshell process is to:

 1. create an account
 2. create a work set
 3. fill the set with HathiTrust items
 4. use the Marc_Downloader algorithm to obtain metadata
 5. use their Data API to obtain full text [3]

I blogged, very briefly, on this subject. [4]

[2] https://htrc2.pti.indiana.edu/HTRC-UI-Portal2/
[3] http://wiki.htrc.illinois.edu/display/COM/HTRC+Data+API+Users+Guide
[4] http://dh.crc.nd.edu/blog/2013/05/htrc/

--
Eric Lease Morgan
University of Notre Dame


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-13 Thread Ford, Kevin
Dear Dana,

Thanks for the detail.  Based on the few example comparisons I've seen, I very 
much like your MARC records more.  Not only are they richer, they break up the 
data better.

Yours,
Kevin


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Dana Pearson
 Sent: Wednesday, June 12, 2013 7:20 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] best way to make MARC files available to anyone
 
 Kevin, Eric
 
 7zip worked fine to unzip and records look pretty good since they used
 653 and preserved the string from the metadata element with the hypens.
  However the records do not do subfield d in 100 or 700 fields and
 thus such content appears in the 245$c.  245$a seems to go missing with
 some frequency.  MarcEdit does not report any errors though.
 
 My original intent was just to keep my XSLT skills sharp while I had
 some free time last August.  After creating the stylesheet, I then had
 no free time until January when I could devote 2 or 3 hours to the post
 transform editing.  Thought I'd just dive in but the pool was much
 deeper than I had anticipated.
 
 Do think libraries will prefer my edited versions although different in
 non-access points as well.  Incidentally, not many additions since my
 harvest.
 
 First record in the Project Gutenberg produced records:
 
 =LDR  00721cam a22002293a 4500
 =001  27384
 =003  PGUSA
 =008  081202s2008xxu|s|000\|\eng\d
 =040  \\$aPGUSA$beng
 =042  \\$adc
 =050  \4$aPQ
 =100  1\$aDumas, Alexandre, 1802-1870
 =245  10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas
 =260  \\$bProject Gutenberg,$c2008
 =500  \\$aProject Gutenberg
 =506  \\$aFreely available.
 =516  \\$aElectronic text
 =653  \0$aFrance -- History -- Regency, 1715-1723 -- Fiction
 =653  \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction
 =830  \0$aProject Gutenberg$v27384
 =856  40$uhttp://www.gutenberg.org/etext/27384
 =856  42$uhttp://www.gutenberg.org/license$3Rights
 
 couldn't readily find the above item but here's an example of my
 records by the same author.
 
 =LDR  01002nam a22002535  4500
 =001  PG18997
 =006  md
 =007  cr||n\|||muaua
 =008  \\s2006utu|o|||eng\d
 =042  \\$adc
 =090  \\$aPQ
 =092  \0$aeBooks
 =100  1\$aDumas, Alexandre,$d1802-1870.
 =245  14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten
 Years Later being the completion of The Three Musketeers And Twenty
 Years After /$Alexandre Dumas.
 =260  \\$aSalt Lake City :$bProject Gutenberg Literary Archive
 Foundation,$c2006.
 =300  \\$a1 online resource :$bmultiple file formats.
 =500  \\$aRecords generated from Project Gutenberg RDF data.
 =540  \\$aApplicable license:$uhttp://www.gutenberg.org/license
 =650  \0$aAdventure stories.
 =650  \0$aHistorical fiction.
 =651  \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction.
 =655  \0$aElectronic books.
 =710  2\$aProject Gutenberg.
 =856  40$uhttp://www.gutenberg.org/etext/18997$zClick to access.
 
 thanks for your interest..
 
 regards,
 dana
 
 
 On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote:
 
  Hi Dana,
 
  Out of curiosity, how does your crosswalk differ from Project
  Gutenberg's MARC files?  See, e.g.:
 
 
 
 http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_
  .28automatically_generated.29
 
  Yours,
  Kevin
 
  --
  Kevin Ford
  Network Development and MARC Standards Office Library of Congress
  Washington, DC
 
 
 
   -Original Message-
   From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
 Behalf
   Of Dana Pearson
   Sent: Tuesday, June 11, 2013 9:24 PM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: [CODE4LIB] best way to make MARC files available to anyone
  
   I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.
 I
   would like to make these files available to any library that is
   interested.
  
   I thought that I would put them on my website via FTP but don't
 know
   if that is the best way.  Don't have an ftp client myself so was
   thinking that that may be now passé.
  
   I tried using Google Drive with access available via the link to
 two
   versions of the files, UTF8 and MARC8.  However, it seems that that
   is not a viable solution.  I can access the files with the URLs
   provided by setting the access to anyone with the URL but doesn't
   work for some of those testing it for me or with the links I have
 on my webpage..
  
   I have five folders with files of about 38 MB total.  I have
   separated the ebooks, audio books, juvenile content, miscellaneous
   and non-Latin scripts such as Chinese, Modern Greek.  Most of the
   content is in the ebook folder.
  
   I would like to make access as easy as possible.
  
   Google Drive seems to work for me.  Here's the link to my page with
   the links in case you would like to look at the folders.  Works for
   me but not for everyone who's tried it.
  
   http://dbpearsonmlis.com

Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-13 Thread Dana Pearson
thanks, Kevin...did notice that one of the records I showed lacked the c
after the $ in the 245...very odd since the stylesheet constructs that
subfield and I would have had no reason to touch that particular
one...phantom bytes?

dana


On Thu, Jun 13, 2013 at 2:20 PM, Ford, Kevin k...@loc.gov wrote:

 Dear Dana,

 Thanks for the detail.  Based on the few example comparisons I've seen, I
 very much like your MARC records more.  Not only are they richer, they
 break up the data better.

 Yours,
 Kevin


  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Dana Pearson
  Sent: Wednesday, June 12, 2013 7:20 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] best way to make MARC files available to anyone
 
  Kevin, Eric
 
  7zip worked fine to unzip and records look pretty good since they used
  653 and preserved the string from the metadata element with the hypens.
   However the records do not do subfield d in 100 or 700 fields and
  thus such content appears in the 245$c.  245$a seems to go missing with
  some frequency.  MarcEdit does not report any errors though.
 
  My original intent was just to keep my XSLT skills sharp while I had
  some free time last August.  After creating the stylesheet, I then had
  no free time until January when I could devote 2 or 3 hours to the post
  transform editing.  Thought I'd just dive in but the pool was much
  deeper than I had anticipated.
 
  Do think libraries will prefer my edited versions although different in
  non-access points as well.  Incidentally, not many additions since my
  harvest.
 
  First record in the Project Gutenberg produced records:
 
  =LDR  00721cam a22002293a 4500
  =001  27384
  =003  PGUSA
  =008  081202s2008xxu|s|000\|\eng\d
  =040  \\$aPGUSA$beng
  =042  \\$adc
  =050  \4$aPQ
  =100  1\$aDumas, Alexandre, 1802-1870
  =245  10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas
  =260  \\$bProject Gutenberg,$c2008
  =500  \\$aProject Gutenberg
  =506  \\$aFreely available.
  =516  \\$aElectronic text
  =653  \0$aFrance -- History -- Regency, 1715-1723 -- Fiction
  =653  \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction
  =830  \0$aProject Gutenberg$v27384
  =856  40$uhttp://www.gutenberg.org/etext/27384
  =856  42$uhttp://www.gutenberg.org/license$3Rights
 
  couldn't readily find the above item but here's an example of my
  records by the same author.
 
  =LDR  01002nam a22002535  4500
  =001  PG18997
  =006  md
  =007  cr||n\|||muaua
  =008  \\s2006utu|o|||eng\d
  =042  \\$adc
  =090  \\$aPQ
  =092  \0$aeBooks
  =100  1\$aDumas, Alexandre,$d1802-1870.
  =245  14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten
  Years Later being the completion of The Three Musketeers And Twenty
  Years After /$Alexandre Dumas.
  =260  \\$aSalt Lake City :$bProject Gutenberg Literary Archive
  Foundation,$c2006.
  =300  \\$a1 online resource :$bmultiple file formats.
  =500  \\$aRecords generated from Project Gutenberg RDF data.
  =540  \\$aApplicable license:$uhttp://www.gutenberg.org/license
  =650  \0$aAdventure stories.
  =650  \0$aHistorical fiction.
  =651  \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction.
  =655  \0$aElectronic books.
  =710  2\$aProject Gutenberg.
  =856  40$uhttp://www.gutenberg.org/etext/18997$zClick to access.
 
  thanks for your interest..
 
  regards,
  dana
 
 
  On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote:
 
   Hi Dana,
  
   Out of curiosity, how does your crosswalk differ from Project
   Gutenberg's MARC files?  See, e.g.:
  
  
  
  http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_
   .28automatically_generated.29
  
   Yours,
   Kevin
  
   --
   Kevin Ford
   Network Development and MARC Standards Office Library of Congress
   Washington, DC
  
  
  
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
  Behalf
Of Dana Pearson
Sent: Tuesday, June 11, 2013 9:24 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] best way to make MARC files available to anyone
   
I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.
  I
would like to make these files available to any library that is
interested.
   
I thought that I would put them on my website via FTP but don't
  know
if that is the best way.  Don't have an ftp client myself so was
thinking that that may be now passé.
   
I tried using Google Drive with access available via the link to
  two
versions of the files, UTF8 and MARC8.  However, it seems that that
is not a viable solution.  I can access the files with the URLs
provided by setting the access to anyone with the URL but doesn't
work for some of those testing it for me or with the links I have
  on my webpage..
   
I have five folders with files of about 38 MB total.  I have
separated the ebooks, audio books, juvenile

Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Owen Stephens
Putting the files on GitHub might be an option - free for public repositories, 
and 38Mb should not be a problem to host there

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 12 Jun 2013, at 02:24, Dana Pearson dbpearsonm...@gmail.com wrote:

 I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I would
 like to make these files available to any library that is interested.
 
 I thought that I would put them on my website via FTP but don't know if
 that is the best way.  Don't have an ftp client myself so was thinking that
 that may be now passé.
 
 I tried using Google Drive with access available via the link to two
 versions of the files, UTF8 and MARC8.  However, it seems that that is not
 a viable solution.  I can access the files with the URLs provided by
 setting the access to anyone with the URL but doesn't work for some of
 those testing it for me or with the links I have on my webpage..
 
 I have five folders with files of about 38 MB total.  I have separated the
 ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts
 such as Chinese, Modern Greek.  Most of the content is in the ebook folder.
 
 I would like to make access as easy as possible.
 
 Google Drive seems to work for me.  Here's the link to my page with the
 links in case you would like to look at the folders.  Works for me but not
 for everyone who's tried it.
 
 http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
 thanks,
 dana
 
 -- 
 Dana Pearson
 dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Ross Singer
Or the Internet Archive, since there are also a whole bunch of other MARC dumps 
there.

-Ross.

On Jun 12, 2013, at 4:25 AM, Owen Stephens o...@ostephens.com wrote:

 Putting the files on GitHub might be an option - free for public 
 repositories, and 38Mb should not be a problem to host there
 
 Owen
 
 Owen Stephens
 Owen Stephens Consulting
 Web: http://www.ostephens.com
 Email: o...@ostephens.com
 Telephone: 0121 288 6936
 
 On 12 Jun 2013, at 02:24, Dana Pearson dbpearsonm...@gmail.com wrote:
 
 I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I would
 like to make these files available to any library that is interested.
 
 I thought that I would put them on my website via FTP but don't know if
 that is the best way.  Don't have an ftp client myself so was thinking that
 that may be now passé.
 
 I tried using Google Drive with access available via the link to two
 versions of the files, UTF8 and MARC8.  However, it seems that that is not
 a viable solution.  I can access the files with the URLs provided by
 setting the access to anyone with the URL but doesn't work for some of
 those testing it for me or with the links I have on my webpage..
 
 I have five folders with files of about 38 MB total.  I have separated the
 ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts
 such as Chinese, Modern Greek.  Most of the content is in the ebook folder.
 
 I would like to make access as easy as possible.
 
 Google Drive seems to work for me.  Here's the link to my page with the
 links in case you would like to look at the folders.  Works for me but not
 for everyone who's tried it.
 
 http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
 thanks,
 dana
 
 -- 
 Dana Pearson
 dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Cary Gordon
I would put them on Dropbox or S3. The Dropbox free account is 5 GB.

Cary


On Wed, Jun 12, 2013 at 4:09 AM, Ross Singer rossfsin...@gmail.com wrote:

 Or the Internet Archive, since there are also a whole bunch of other MARC
 dumps there.

 -Ross.

 On Jun 12, 2013, at 4:25 AM, Owen Stephens o...@ostephens.com wrote:

  Putting the files on GitHub might be an option - free for public
 repositories, and 38Mb should not be a problem to host there
 
  Owen
 
  Owen Stephens
  Owen Stephens Consulting
  Web: http://www.ostephens.com
  Email: o...@ostephens.com
  Telephone: 0121 288 6936
 
  On 12 Jun 2013, at 02:24, Dana Pearson dbpearsonm...@gmail.com wrote:
 
  I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
 would
  like to make these files available to any library that is interested.
 
  I thought that I would put them on my website via FTP but don't know if
  that is the best way.  Don't have an ftp client myself so was thinking
 that
  that may be now passé.
 
  I tried using Google Drive with access available via the link to two
  versions of the files, UTF8 and MARC8.  However, it seems that that is
 not
  a viable solution.  I can access the files with the URLs provided by
  setting the access to anyone with the URL but doesn't work for some of
  those testing it for me or with the links I have on my webpage..
 
  I have five folders with files of about 38 MB total.  I have separated
 the
  ebooks, audio books, juvenile content, miscellaneous and non-Latin
 scripts
  such as Chinese, Modern Greek.  Most of the content is in the ebook
 folder.
 
  I would like to make access as easy as possible.
 
  Google Drive seems to work for me.  Here's the link to my page with the
  links in case you would like to look at the folders.  Works for me but
 not
  for everyone who's tried it.
 
  http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
  thanks,
  dana
 
  --
  Dana Pearson
  dbpearsonmlis.com




-- 
Cary Gordon
The Cherry Hill Company
http://chillco.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Dana Pearson
Thanks for the replies..I had looked at GitHub but thought it something
different, ie, collaborative software development...I will look again

hadn't thought of the Internet archive but that might be good and I'll take
a look at dropbox and Eric's other suggestions...altogether new to the
'cloud'

and regarding MARC records on the Gutenberg Project page...there is a new
feature that converts RDF/DC to MARC  but the download was small so I
suspect only recent additions...in fact, the necessary editing would remain
but may be useful for keeping my work up to date...I'll be interested to
see how it handles new line feeds in dc:title elements.

thanks again for the suggestions including Cary's that comes in as I type
this

dana




On Wed, Jun 12, 2013 at 6:09 AM, Ross Singer rossfsin...@gmail.com wrote:

 Or the Internet Archive, since there are also a whole bunch of other MARC
 dumps there.

 -Ross.

 On Jun 12, 2013, at 4:25 AM, Owen Stephens o...@ostephens.com wrote:

  Putting the files on GitHub might be an option - free for public
 repositories, and 38Mb should not be a problem to host there
 
  Owen
 
  Owen Stephens
  Owen Stephens Consulting
  Web: http://www.ostephens.com
  Email: o...@ostephens.com
  Telephone: 0121 288 6936
 
  On 12 Jun 2013, at 02:24, Dana Pearson dbpearsonm...@gmail.com wrote:
 
  I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
 would
  like to make these files available to any library that is interested.
 
  I thought that I would put them on my website via FTP but don't know if
  that is the best way.  Don't have an ftp client myself so was thinking
 that
  that may be now passé.
 
  I tried using Google Drive with access available via the link to two
  versions of the files, UTF8 and MARC8.  However, it seems that that is
 not
  a viable solution.  I can access the files with the URLs provided by
  setting the access to anyone with the URL but doesn't work for some of
  those testing it for me or with the links I have on my webpage..
 
  I have five folders with files of about 38 MB total.  I have separated
 the
  ebooks, audio books, juvenile content, miscellaneous and non-Latin
 scripts
  such as Chinese, Modern Greek.  Most of the content is in the ebook
 folder.
 
  I would like to make access as easy as possible.
 
  Google Drive seems to work for me.  Here's the link to my page with the
  links in case you would like to look at the folders.  Works for me but
 not
  for everyone who's tried it.
 
  http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
  thanks,
  dana
 
  --
  Dana Pearson
  dbpearsonmlis.com




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Owen Stephens
On 12 Jun 2013, at 14:06, Dana Pearson dbpearsonm...@gmail.com wrote:

 Thanks for the replies..I had looked at GitHub but thought it something
 different, ie, collaborative software development...I will look again

Yes - that's the main use (git is version control software, GitHub hosts git 
repositories) - but of course git doesn't care what types of files you have 
under version control. It came to mind because I know it's been used to 
distribute metadata files before - e.g. this set of metadata from the Cooper 
Hewitt National Design Museum https://github.com/cooperhewitt/collection

There could be some additional benefits gained through using git to version 
control this type of file, and GitHub to distribute them if you were 
interested, but it can act as simply a place to put the files and make them 
available for download. But of course the other suggestions would do this 
simpler task just as well.

Owen


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Ford, Kevin
Hi Dana,

Out of curiosity, how does your crosswalk differ from Project Gutenberg's MARC 
files?  See, e.g.:

http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29

Yours,
Kevin

--
Kevin Ford
Network Development and MARC Standards Office
Library of Congress 
Washington, DC



 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Dana Pearson
 Sent: Tuesday, June 11, 2013 9:24 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] best way to make MARC files available to anyone
 
 I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
 would like to make these files available to any library that is
 interested.
 
 I thought that I would put them on my website via FTP but don't know if
 that is the best way.  Don't have an ftp client myself so was thinking
 that that may be now passé.
 
 I tried using Google Drive with access available via the link to two
 versions of the files, UTF8 and MARC8.  However, it seems that that is
 not a viable solution.  I can access the files with the URLs provided
 by setting the access to anyone with the URL but doesn't work for some
 of those testing it for me or with the links I have on my webpage..
 
 I have five folders with files of about 38 MB total.  I have separated
 the ebooks, audio books, juvenile content, miscellaneous and non-Latin
 scripts such as Chinese, Modern Greek.  Most of the content is in the
 ebook folder.
 
 I would like to make access as easy as possible.
 
 Google Drive seems to work for me.  Here's the link to my page with the
 links in case you would like to look at the folders.  Works for me but
 not for everyone who's tried it.
 
 http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
 thanks,
 dana
 
 --
 Dana Pearson
 dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Ford, Kevin
Doh!

I read all the emails in the thread except for Eric's, which asked the same 
question.

Either way, his or mine, nevertheless curious.

Kevin

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Eric Phetteplace
 Sent: Tuesday, June 11, 2013 10:57 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] best way to make MARC files available to anyone
 
 Dana - perhaps a public Dropbox folder? Or just put the files up on
 your site somewhere, served with a Content-Disposition: attachment
 header so they trigger a download when accessed? E.g. here's a
 StackOverflowhttp://stackoverflow.com/questions/9195304/how-to-use-
 content-disposition-for-force-a-file-to-download-to-the-hard-
 drivethread
 on that. If they must be a recognized MIME type, you could compress
 them as .zip or .tar.gz files on the server, which would reduce
 download time either way.
 
 I did try clicking the links on your site and they never downloaded,
 the request just timed out.
 
 Not to discredit what you're doing, which is great, but aren't MARC
 records already available for Project Gutenberg? See their offline
 catalogshttp://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_
 Records_.28offsite.29page.
 
 Best,
 Eric Phetteplace
 Emerging Technologies Librarian
 Chesapeake College
 Wye Mills, MD
 
 
 On Tue, Jun 11, 2013 at 9:24 PM, Dana Pearson
 dbpearsonm...@gmail.comwrote:
 
  I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
  would like to make these files available to any library that is
 interested.
 
  I thought that I would put them on my website via FTP but don't know
  if that is the best way.  Don't have an ftp client myself so was
  thinking that that may be now passé.
 
  I tried using Google Drive with access available via the link to two
  versions of the files, UTF8 and MARC8.  However, it seems that that
 is
  not a viable solution.  I can access the files with the URLs provided
  by setting the access to anyone with the URL but doesn't work for
 some
  of those testing it for me or with the links I have on my webpage..
 
  I have five folders with files of about 38 MB total.  I have
 separated
  the ebooks, audio books, juvenile content, miscellaneous and non-
 Latin
  scripts such as Chinese, Modern Greek.  Most of the content is in the
 ebook folder.
 
  I would like to make access as easy as possible.
 
  Google Drive seems to work for me.  Here's the link to my page with
  the links in case you would like to look at the folders.  Works for
 me
  but not for everyone who's tried it.
 
  http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
  thanks,
  dana
 
  --
  Dana Pearson
  dbpearsonmlis.com
 


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Daniel Lovins
If anyone from HathiTrust is watching this thread, I'd also be curious if
they're considering bulk record downloads via something other than OAI [1].

Thanks.

Daniel
[1] http://www.lib.umich.edu/michigan-digitization-project-oai-harvesting


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Ford, Kevin
Sent: Wednesday, June 12, 2013 10:12 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] best way to make MARC files available to anyone

Doh!

I read all the emails in the thread except for Eric's, which asked the same
question.

Either way, his or mine, nevertheless curious.

Kevin

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
 Of Eric Phetteplace
 Sent: Tuesday, June 11, 2013 10:57 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] best way to make MARC files available to
 anyone

 Dana - perhaps a public Dropbox folder? Or just put the files up on
 your site somewhere, served with a Content-Disposition: attachment
 header so they trigger a download when accessed? E.g. here's a
 StackOverflowhttp://stackoverflow.com/questions/9195304/how-to-use-
 content-disposition-for-force-a-file-to-download-to-the-hard-
 drivethread
 on that. If they must be a recognized MIME type, you could compress
 them as .zip or .tar.gz files on the server, which would reduce
 download time either way.

 I did try clicking the links on your site and they never downloaded,
 the request just timed out.

 Not to discredit what you're doing, which is great, but aren't MARC
 records already available for Project Gutenberg? See their offline
 catalogshttp://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC
 _
 Records_.28offsite.29page.

 Best,
 Eric Phetteplace
 Emerging Technologies Librarian
 Chesapeake College
 Wye Mills, MD


 On Tue, Jun 11, 2013 at 9:24 PM, Dana Pearson
 dbpearsonm...@gmail.comwrote:

  I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
  would like to make these files available to any library that is
 interested.
 
  I thought that I would put them on my website via FTP but don't know
  if that is the best way.  Don't have an ftp client myself so was
  thinking that that may be now passé.
 
  I tried using Google Drive with access available via the link to two
  versions of the files, UTF8 and MARC8.  However, it seems that that
 is
  not a viable solution.  I can access the files with the URLs
  provided by setting the access to anyone with the URL but doesn't
  work for
 some
  of those testing it for me or with the links I have on my webpage..
 
  I have five folders with files of about 38 MB total.  I have
 separated
  the ebooks, audio books, juvenile content, miscellaneous and non-
 Latin
  scripts such as Chinese, Modern Greek.  Most of the content is in
  the
 ebook folder.
 
  I would like to make access as easy as possible.
 
  Google Drive seems to work for me.  Here's the link to my page with
  the links in case you would like to look at the folders.  Works for
 me
  but not for everyone who's tried it.
 
  http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
  thanks,
  dana
 
  --
  Dana Pearson
  dbpearsonmlis.com
 


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Dana Pearson
Kevin,

don't know yet since don't know how to unzip the file...bz2?...in any case,
I'm guessing that there is no post transformation editing that most
libraries would insist upon...eg, subject headings in the metadata are
strings with hyphens separating subjects from subheadings and spatial,
temporal, genre subfields have to be introduced...some content needs to go
into 600,610, 611,630,651 fields...for more on the post transform editing
see:

http://dbpearsonmlis.com/GPmetadata.html

dana


On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote:

 Hi Dana,

 Out of curiosity, how does your crosswalk differ from Project Gutenberg's
 MARC files?  See, e.g.:


 http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29

 Yours,
 Kevin

 --
 Kevin Ford
 Network Development and MARC Standards Office
 Library of Congress
 Washington, DC



  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Dana Pearson
  Sent: Tuesday, June 11, 2013 9:24 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] best way to make MARC files available to anyone
 
  I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
  would like to make these files available to any library that is
  interested.
 
  I thought that I would put them on my website via FTP but don't know if
  that is the best way.  Don't have an ftp client myself so was thinking
  that that may be now passé.
 
  I tried using Google Drive with access available via the link to two
  versions of the files, UTF8 and MARC8.  However, it seems that that is
  not a viable solution.  I can access the files with the URLs provided
  by setting the access to anyone with the URL but doesn't work for some
  of those testing it for me or with the links I have on my webpage..
 
  I have five folders with files of about 38 MB total.  I have separated
  the ebooks, audio books, juvenile content, miscellaneous and non-Latin
  scripts such as Chinese, Modern Greek.  Most of the content is in the
  ebook folder.
 
  I would like to make access as easy as possible.
 
  Google Drive seems to work for me.  Here's the link to my page with the
  links in case you would like to look at the folders.  Works for me but
  not for everyone who's tried it.
 
  http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
  thanks,
  dana
 
  --
  Dana Pearson
  dbpearsonmlis.com




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Dana Pearson
Kevin, Eric

7zip worked fine to unzip and records look pretty good since they used 653
and preserved the string from the metadata element with the hypens.
 However the records do not do subfield d in 100 or 700 fields and thus
such content appears in the 245$c.  245$a seems to go missing with some
frequency.  MarcEdit does not report any errors though.

My original intent was just to keep my XSLT skills sharp while I had some
free time last August.  After creating the stylesheet, I then had no free
time until January when I could devote 2 or 3 hours to the post transform
editing.  Thought I'd just dive in but the pool was much deeper than I had
anticipated.

Do think libraries will prefer my edited versions although different in
non-access points as well.  Incidentally, not many additions since my
harvest.

First record in the Project Gutenberg produced records:

=LDR  00721cam a22002293a 4500
=001  27384
=003  PGUSA
=008  081202s2008xxu|s|000\|\eng\d
=040  \\$aPGUSA$beng
=042  \\$adc
=050  \4$aPQ
=100  1\$aDumas, Alexandre, 1802-1870
=245  10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas
=260  \\$bProject Gutenberg,$c2008
=500  \\$aProject Gutenberg
=506  \\$aFreely available.
=516  \\$aElectronic text
=653  \0$aFrance -- History -- Regency, 1715-1723 -- Fiction
=653  \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction
=830  \0$aProject Gutenberg$v27384
=856  40$uhttp://www.gutenberg.org/etext/27384
=856  42$uhttp://www.gutenberg.org/license$3Rights

couldn't readily find the above item but here's an example of my records by
the same author.

=LDR  01002nam a22002535  4500
=001  PG18997
=006  md
=007  cr||n\|||muaua
=008  \\s2006utu|o|||eng\d
=042  \\$adc
=090  \\$aPQ
=092  \0$aeBooks
=100  1\$aDumas, Alexandre,$d1802-1870.
=245  14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten Years
Later being the completion of The Three Musketeers And Twenty Years
After /$Alexandre Dumas.
=260  \\$aSalt Lake City :$bProject Gutenberg Literary Archive
Foundation,$c2006.
=300  \\$a1 online resource :$bmultiple file formats.
=500  \\$aRecords generated from Project Gutenberg RDF data.
=540  \\$aApplicable license:$uhttp://www.gutenberg.org/license
=650  \0$aAdventure stories.
=650  \0$aHistorical fiction.
=651  \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction.
=655  \0$aElectronic books.
=710  2\$aProject Gutenberg.
=856  40$uhttp://www.gutenberg.org/etext/18997$zClick to access.

thanks for your interest..

regards,
dana


On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote:

 Hi Dana,

 Out of curiosity, how does your crosswalk differ from Project Gutenberg's
 MARC files?  See, e.g.:


 http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29

 Yours,
 Kevin

 --
 Kevin Ford
 Network Development and MARC Standards Office
 Library of Congress
 Washington, DC



  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Dana Pearson
  Sent: Tuesday, June 11, 2013 9:24 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] best way to make MARC files available to anyone
 
  I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
  would like to make these files available to any library that is
  interested.
 
  I thought that I would put them on my website via FTP but don't know if
  that is the best way.  Don't have an ftp client myself so was thinking
  that that may be now passé.
 
  I tried using Google Drive with access available via the link to two
  versions of the files, UTF8 and MARC8.  However, it seems that that is
  not a viable solution.  I can access the files with the URLs provided
  by setting the access to anyone with the URL but doesn't work for some
  of those testing it for me or with the links I have on my webpage..
 
  I have five folders with files of about 38 MB total.  I have separated
  the ebooks, audio books, juvenile content, miscellaneous and non-Latin
  scripts such as Chinese, Modern Greek.  Most of the content is in the
  ebook folder.
 
  I would like to make access as easy as possible.
 
  Google Drive seems to work for me.  Here's the link to my page with the
  links in case you would like to look at the folders.  Works for me but
  not for everyone who's tried it.
 
  http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
  thanks,
  dana
 
  --
  Dana Pearson
  dbpearsonmlis.com




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Dana Pearson
 resource] :$bOr Ten Years
 Later being the completion of The Three Musketeers And Twenty Years
 After /$Alexandre Dumas.
 =260  \\$aSalt Lake City :$bProject Gutenberg Literary Archive
 Foundation,$c2006.
 =300  \\$a1 online resource :$bmultiple file formats.
 =500  \\$aRecords generated from Project Gutenberg RDF data.
 =540  \\$aApplicable license:$uhttp://www.gutenberg.org/license
 =650  \0$aAdventure stories.
 =650  \0$aHistorical fiction.
 =651  \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction.
 =655  \0$aElectronic books.
 =710  2\$aProject Gutenberg.
 =856  40$uhttp://www.gutenberg.org/etext/18997$zClick to access.

 thanks for your interest..

 regards,
 dana


 On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote:

 Hi Dana,

 Out of curiosity, how does your crosswalk differ from Project Gutenberg's
 MARC files?  See, e.g.:


 http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29

 Yours,
 Kevin

 --
 Kevin Ford
 Network Development and MARC Standards Office
 Library of Congress
 Washington, DC



  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Dana Pearson
  Sent: Tuesday, June 11, 2013 9:24 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] best way to make MARC files available to anyone
 
  I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
  would like to make these files available to any library that is
  interested.
 
  I thought that I would put them on my website via FTP but don't know if
  that is the best way.  Don't have an ftp client myself so was thinking
  that that may be now passé.
 
  I tried using Google Drive with access available via the link to two
  versions of the files, UTF8 and MARC8.  However, it seems that that is
  not a viable solution.  I can access the files with the URLs provided
  by setting the access to anyone with the URL but doesn't work for some
  of those testing it for me or with the links I have on my webpage..
 
  I have five folders with files of about 38 MB total.  I have separated
  the ebooks, audio books, juvenile content, miscellaneous and non-Latin
  scripts such as Chinese, Modern Greek.  Most of the content is in the
  ebook folder.
 
  I would like to make access as easy as possible.
 
  Google Drive seems to work for me.  Here's the link to my page with the
  links in case you would like to look at the folders.  Works for me but
  not for everyone who's tried it.
 
  http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
  thanks,
  dana
 
  --
  Dana Pearson
  dbpearsonmlis.com




 --
 Dana Pearson
 dbpearsonmlis.com




-- 
Dana Pearson
dbpearsonmlis.com


[CODE4LIB] best way to make MARC files available to anyone

2013-06-11 Thread Dana Pearson
I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I would
like to make these files available to any library that is interested.

I thought that I would put them on my website via FTP but don't know if
that is the best way.  Don't have an ftp client myself so was thinking that
that may be now passé.

I tried using Google Drive with access available via the link to two
versions of the files, UTF8 and MARC8.  However, it seems that that is not
a viable solution.  I can access the files with the URLs provided by
setting the access to anyone with the URL but doesn't work for some of
those testing it for me or with the links I have on my webpage..

I have five folders with files of about 38 MB total.  I have separated the
ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts
such as Chinese, Modern Greek.  Most of the content is in the ebook folder.

I would like to make access as easy as possible.

Google Drive seems to work for me.  Here's the link to my page with the
links in case you would like to look at the folders.  Works for me but not
for everyone who's tried it.

http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html

thanks,
dana

-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-11 Thread Eric Phetteplace
Dana - perhaps a public Dropbox folder? Or just put the files up on your
site somewhere, served with a Content-Disposition: attachment header so
they trigger a download when accessed? E.g. here's a
StackOverflowhttp://stackoverflow.com/questions/9195304/how-to-use-content-disposition-for-force-a-file-to-download-to-the-hard-drivethread
on that. If they must be a recognized MIME type, you could compress
them as .zip or .tar.gz files on the server, which would reduce download
time either way.

I did try clicking the links on your site and they never downloaded, the
request just timed out.

Not to discredit what you're doing, which is great, but aren't MARC records
already available for Project Gutenberg? See their offline
catalogshttp://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28offsite.29page.

Best,
Eric Phetteplace
Emerging Technologies Librarian
Chesapeake College
Wye Mills, MD


On Tue, Jun 11, 2013 at 9:24 PM, Dana Pearson dbpearsonm...@gmail.comwrote:

 I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I would
 like to make these files available to any library that is interested.

 I thought that I would put them on my website via FTP but don't know if
 that is the best way.  Don't have an ftp client myself so was thinking that
 that may be now passé.

 I tried using Google Drive with access available via the link to two
 versions of the files, UTF8 and MARC8.  However, it seems that that is not
 a viable solution.  I can access the files with the URLs provided by
 setting the access to anyone with the URL but doesn't work for some of
 those testing it for me or with the links I have on my webpage..

 I have five folders with files of about 38 MB total.  I have separated the
 ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts
 such as Chinese, Modern Greek.  Most of the content is in the ebook folder.

 I would like to make access as easy as possible.

 Google Drive seems to work for me.  Here's the link to my page with the
 links in case you would like to look at the folders.  Works for me but not
 for everyone who's tried it.

 http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html

 thanks,
 dana

 --
 Dana Pearson
 dbpearsonmlis.com