Re: [CODE4LIB] library of congress call number subject coding

2014-09-03 Thread Dana Pearson
yes, that works, thanks Bilal...very impressive

regards,
dana


On Wed, Sep 3, 2014 at 11:06 AM, Bilal Khalid bilal.kha...@utoronto.ca
wrote:

 Apologies! Here's a link that should be more durable:
 http://www.library.utoronto.ca/bilal/lc_dimension.xml

 Regards,
 -Bilal

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Dana Pearson
 Sent: Tuesday, September 02, 2014 8:53 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] library of congress call number subject coding

 Hi Bilal,

 sounds very interesting but the link does not connect to anything

 don't have an immediate need but i work with XSL, MARCXML and would be fun
 to experiment

 regards,
 dana


 On Tue, Sep 2, 2014 at 4:24 PM, Bilal Khalid bilal.kha...@utoronto.ca
 wrote:

  Hi Ken,
 
  Here's a link to an XML mapping of LC call numbers ranges to
  categories that we use in an indexing software. It may be a bit hefty
  for your needs (almost 6000 mappings), but hope it helps!
 
  http://bilalk.library.utoronto.ca/lc_dimension.xml
 
  Cheers,
  -Bilal
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
  Of Ken Irwin
  Sent: Tuesday, September 02, 2014 4:42 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] library of congress call number subject coding
 
  Hi folks,
 
  Does anyone have a handy scheme for coding LC call numbers into just a
  few broad subject areas (e.g. Arts, Humanities, Sciences, Social
  Sciences) or perhaps something only a little more granular than that?
 
  I'm hoping for a list that will turn 1-3 letter LC classes into
  subject groups, and I'd rather not reinvent the wheel if someone's
  already got something.
 
  Any leads?
 
  Thanks
  Ken
 



 --
 Dana Pearson
 dbpearsonmlis.com
 Metadata and Bibliographic Services for Libraries




-- 
Dana Pearson
dbpearsonmlis.com
Metadata and Bibliographic Services for Libraries


Re: [CODE4LIB] library of congress call number subject coding

2014-09-02 Thread Dana Pearson
Hi Bilal,

sounds very interesting but the link does not connect to anything

don't have an immediate need but i work with XSL, MARCXML and would be fun
to experiment

regards,
dana


On Tue, Sep 2, 2014 at 4:24 PM, Bilal Khalid bilal.kha...@utoronto.ca
wrote:

 Hi Ken,

 Here's a link to an XML mapping of LC call numbers ranges to categories
 that we use in an indexing software. It may be a bit hefty for your needs
 (almost 6000 mappings), but hope it helps!

 http://bilalk.library.utoronto.ca/lc_dimension.xml

 Cheers,
 -Bilal

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Ken Irwin
 Sent: Tuesday, September 02, 2014 4:42 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] library of congress call number subject coding

 Hi folks,

 Does anyone have a handy scheme for coding LC call numbers into just a few
 broad subject areas (e.g. Arts, Humanities, Sciences, Social Sciences) or
 perhaps something only a little more granular than that?

 I'm hoping for a list that will turn 1-3 letter LC classes into subject
 groups, and I'd rather not reinvent the wheel if someone's already got
 something.

 Any leads?

 Thanks
 Ken




-- 
Dana Pearson
dbpearsonmlis.com
Metadata and Bibliographic Services for Libraries


Re: [CODE4LIB] metadata for free ebook repositories

2014-08-18 Thread Dana Pearson
Hi Stuart,

I've done RDF/DC to MARC for the Gutenberg Project.  Requires a lot of
clean up especially with respect to subject heading strings since LCSH
might well appear in DC element but need to be parsed into marc subfields.
 Tedious, human intervention required in the case of the Gutenberg Project.

Close to finishing the editing of about 4000 records harvested in late
December, 2014; about 16 months after an initial harvest of about 40,000.

The RDF/DC had changed somewhat but significantly fewer subject headings it
seemed.  I decided to examine virtually every item and to find better
records at the Library of Congress or more frequently the Internet Archive
[ archive.org/details/texts ]

Fully agree how important it is but don't think I'll do it again since
consumes all my free time.  Maybe if others could volunteer to do that, I
could continue harvesting.  Only download of the complete collection is
possible but I use XSL to select records based on date added.

The collections you mention are worthy of being included in library
systems.  Metadata quality is a limiting factor.

regards,
dana


On Mon, Aug 18, 2014 at 5:04 PM, Stuart Yeates stuart.yea...@vuw.ac.nz
wrote:

 There are a stack of great free ebook repositories available on the web,
 things like https://unglue.it/ http://www.gutenberg.org/
 https://en.wikibooks.org/wiki/Main_Page http://www.gutenberg.net.au/
 https://www.smashwords.com/books/category/1/newest/0/free/any etc, etc

 What there doesn't appear to be, is high-quality AACR2 / RDA records
 available for these. There are things like https://ebooks.adelaide.edu.
 au/meta/pg/ which are elaborate dublin core to MARC converters, but these
 lack standardisation of names, authority control (people, entities, places,
 etc), interlinking, etc.

 It seems to me that quality metadata would greatly increase the value /
 findability / use of these projects and thus their visibility and available
 sources.

 Are there any projects working in this space already? Are there suitable
 tools available?

 cheers
 stuart




-- 
Dana Pearson
dbpearsonmlis.com
Metadata and Bibliographic Services for Libraries


Re: [CODE4LIB] metadata for free ebook repositories

2014-08-18 Thread Dana Pearson
Karen,

It seems to me that the Open Library would want to broaden use of this
great collection as much as possible.  Yet, MARC records for the 1/3  or so
items in the collection cannot be downloaded so that they could be imported
into local library systems.

Lots of users searching local libraries who might well use google and Open
Library, Internet Archive for finding ebooks less frequently.

I'll look at Tom Morris's code to see if I might automate record selection
of Open Library records compared with element of MARCXML records of this
last group of Guterberg Project additions.  Thanks for that information.

regards,
dana


On Mon, Aug 18, 2014 at 6:57 PM, Karen Coyle li...@kcoyle.net wrote:

 About 1/3 of the 1M ebooks on OpenLibrary.org have full MARC records, and
 you can retrieve the record via the API. There is also a secret record
 format that returns not the full MARC for the hard copy (which is what the
 records represent because these are digitized books) but a record that has
 been modified to represent the ebook.

 The MARC records for the hard copy follow the pattern:

 https://archive.org/download/[archive identifier]/[archive
 identifier]_marc.[xml|mrc]

 Download MARC XML https://archive.org/download/myantonia00cathrich/
 myantonia00cathrich_marc.xml
 Download MARC binary https://www.archive.org/download/myantonia00cathrich/
 myantonia00cathrich_meta.mrc https://archive.org/download/
 myantonia00cathrich/myantonia00cathrich_meta.mrc


 To get the one that represents the ebook, do:

 https://archive.org/download/[archive identifier]/[archive
 identifier]_archive_marc.xml

 https://archive.org/download/myantonia00cathrich/
 myantonia00cathrich_archive_marc.xml

 This one has an 007, the 245 $h, and a few other things.

 Tom Morris did some code that helps you search for books by author and
 title and retrieve a MARC record. I don't recall where his github archive
 is, but I'll find out and post it here. The code is open source. We used it
 for a project that added ebook records to a public library catalog.

 You can also use the OPenLibrary API to select all open access ebooks.
 What I'd like to see is a way to create a list or bibliography in OL that
 then is imported into a program that will find MARC records for those
 books. The list function is still under development, though.

 kc


 On 8/18/14, 3:04 PM, Stuart Yeates wrote:

 There are a stack of great free ebook repositories available on the web,
 things like https://unglue.it/ http://www.gutenberg.org/
 https://en.wikibooks.org/wiki/Main_Page http://www.gutenberg.net.au/
 https://www.smashwords.com/books/category/1/newest/0/free/any etc, etc

 What there doesn't appear to be, is high-quality AACR2 / RDA records
 available for these. There are things like https://ebooks.adelaide.edu.
 au/meta/pg/ which are elaborate dublin core to MARC converters, but
 these lack standardisation of names, authority control (people, entities,
 places, etc), interlinking, etc.

 It seems to me that quality metadata would greatly increase the value /
 findability / use of these projects and thus their visibility and available
 sources.

 Are there any projects working in this space already? Are there suitable
 tools available?

 cheers
 stuart


 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 m: +1-510-435-8234
 skype: kcoylenet/+1-510-984-3600




-- 
Dana Pearson
dbpearsonmlis.com
Metadata and Bibliographic Services for Libraries


Re: [CODE4LIB] Excel to XML

2014-06-13 Thread Dana Pearson
I don't use Excel but a client did who wanted to use XSL I had created ONIX
to MARC to transformbibliographic metadata in Excel to XML.  The built
in Excel XML converter was not very helpful since empty cells were skipped
so that it was impossible to use that result.

There is an add on that allow you to map your data to XML elements by
creating a schema which is pretty cool.

http://bit.ly/1jpwtqM

This might be helpful.

regards,
dana


On Fri, Jun 13, 2014 at 6:53 PM, Terry Brady tw...@georgetown.edu wrote:

 The current version of Excel offers a save as XML option.

 It will produce something like this.  There is other wrapping metadata, but
 the table is pretty easy to parse.

   Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7
 x:FullColumns=1
x:FullRows=1 ss:DefaultRowHeight=15
Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell
 CellData ss:Type=Stringquestion 1/Data/Cell
 CellData ss:Type=Stringanswer 1/Data/Cell
/Row
Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell
 Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell
/Row
Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell
 Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell
/Row
Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell
 CellData ss:Type=Stringquestion 2/Data/Cell
 CellData ss:Type=Stringanswer 1/Data/Cell
/Row
Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell
 Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell
/Row
Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell
 CellData ss:Type=Stringquest /Data/Cell
 CellData ss:Type=Stringanswer 3/Data/Cell
/Row
Row
 Cell ss:StyleID=s62/
/Row
   /Table


 On Fri, Jun 13, 2014 at 2:28 PM, Ryan Engel rten...@wisc.edu wrote:

  Hello -
 
  I have an Excel spreadsheet that, for the purposes of an easy import into
  a Drupal site, I'd like to convert to XML.  I know people more
  knowledgeable than I could code up something in Python or Perl to
 convert a
  CSV version of the data to XML (and I have a colleague who offered to do
  just that for me), but I am looking for recommendations for something
 more
  immediately accessible.
 
  Here's an idea of how the spreadsheet is structured:
 
  Row1Question1Q1Answer1
  Row2Q1Answer2
  Row3Q1Answer3
  Row4Question2Q2Answer1
  Row5Q2Answer2
  Row6Question3Q3Answer1
  etc.
 
  How do other people approach this?  Import the data to an SQL database,
  write some clever queries, and then export that to XML?  Work some
 wizardry
  in GoogleRefine/OpenRefine?  Are scripting languages really the best all
  around solution?  Excel's built in XML mapping function wasn't able to
  process the one-to-many relationship of questions to answers, though
 maybe
  I just don't know how to build the mapping structure correctly.
 
  In the interest immanent deadlines, I have handed the spreadsheet off to
  my Perl-writing colleague.  But as a professional growth opportunity, I'm
  interested in suggestions from Libraryland about ways others have
  approached this successfully.
 
  Thanks!
 
  Ryan Engel
  Web Stuff
  UW-Madison
 



 --
 Terry Brady
 Applications Programmer Analyst
 Georgetown University Library Information Technology
 https://www.library.georgetown.edu/lit/code
 425-298-5498




-- 
Dana Pearson
dbpearsonmlis.com
Metadata and Bibliographic Services for Libraries


Re: [CODE4LIB] Python and Ruby

2013-07-29 Thread Dana Pearson
Josh,

I work exclusively with XSLT but specialize in metadata only no need for
content display choices

maybe a candidate for library programming language...XSLT 2.0 has useful
analyze-string element to cover Roy's point

by the way, Josh, live just down the road in Leeton

regards,
dana


On Mon, Jul 29, 2013 at 12:04 PM, Roy Tennant roytenn...@gmail.com wrote:

 On Mon, Jul 29, 2013 at 9:57 AM, Peter Schlumpf pschlu...@earthlink.net
 wrote:
  Imagine if the library community had its own programming/scripting
 language, at least one that is domain relevant.
  What would it look like?

 Whatever else it had, it would have to have a sophisticated way to
 inspect text for patterns -- that is, regular expressions.
 Roy




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-13 Thread Dana Pearson
Thanks Owen,

I conflated github and dropbox in my earlier summary and left out any
reference to dropbox...they do the email requirement...sorry...it was late
and a hurried summary...will look again for that download option on github

thanks again,
dana


On Thu, Jun 13, 2013 at 9:09 AM, Owen Stephens o...@ostephens.com wrote:

 On 13 Jun 2013, at 02:57, Dana Pearson dbpearsonm...@gmail.com wrote:

  quick followup on the thread..
 
  github:  I looked at the cooperhewitt collection but don't see a way to
  download the content...I could copy and paste their content but that may
  not be the best approach for my files...documentation is thin, seems i
  would have to provide email addresses for those seeking access...but
  clearly that is not the case with how the cooperhewitt archive is
  configured..
 
  My primary concern has been to make it as simple a process as possible
 for
  libraries which have limited technical expertise.

 I suspect from what you say that GitHub is not what you want in this case.
 However, I just wanted to clarify that you can download files as a Zip file
 (e.g. for Cooper Hewitt
 https://github.com/cooperhewitt/collection/archive/master.zip), and that
 this link is towards the top left on each screen in GitHub. The repository
 is a public one (which is the default, and only option unless you have a
 paid account on GitHub) and you do not need to provide email addresses or
 anything else to access the files on a public repository

 Owen




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-13 Thread Dana Pearson
thanks, Kevin...did notice that one of the records I showed lacked the c
after the $ in the 245...very odd since the stylesheet constructs that
subfield and I would have had no reason to touch that particular
one...phantom bytes?

dana


On Thu, Jun 13, 2013 at 2:20 PM, Ford, Kevin k...@loc.gov wrote:

 Dear Dana,

 Thanks for the detail.  Based on the few example comparisons I've seen, I
 very much like your MARC records more.  Not only are they richer, they
 break up the data better.

 Yours,
 Kevin


  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Dana Pearson
  Sent: Wednesday, June 12, 2013 7:20 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] best way to make MARC files available to anyone
 
  Kevin, Eric
 
  7zip worked fine to unzip and records look pretty good since they used
  653 and preserved the string from the metadata element with the hypens.
   However the records do not do subfield d in 100 or 700 fields and
  thus such content appears in the 245$c.  245$a seems to go missing with
  some frequency.  MarcEdit does not report any errors though.
 
  My original intent was just to keep my XSLT skills sharp while I had
  some free time last August.  After creating the stylesheet, I then had
  no free time until January when I could devote 2 or 3 hours to the post
  transform editing.  Thought I'd just dive in but the pool was much
  deeper than I had anticipated.
 
  Do think libraries will prefer my edited versions although different in
  non-access points as well.  Incidentally, not many additions since my
  harvest.
 
  First record in the Project Gutenberg produced records:
 
  =LDR  00721cam a22002293a 4500
  =001  27384
  =003  PGUSA
  =008  081202s2008xxu|s|000\|\eng\d
  =040  \\$aPGUSA$beng
  =042  \\$adc
  =050  \4$aPQ
  =100  1\$aDumas, Alexandre, 1802-1870
  =245  10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas
  =260  \\$bProject Gutenberg,$c2008
  =500  \\$aProject Gutenberg
  =506  \\$aFreely available.
  =516  \\$aElectronic text
  =653  \0$aFrance -- History -- Regency, 1715-1723 -- Fiction
  =653  \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction
  =830  \0$aProject Gutenberg$v27384
  =856  40$uhttp://www.gutenberg.org/etext/27384
  =856  42$uhttp://www.gutenberg.org/license$3Rights
 
  couldn't readily find the above item but here's an example of my
  records by the same author.
 
  =LDR  01002nam a22002535  4500
  =001  PG18997
  =006  md
  =007  cr||n\|||muaua
  =008  \\s2006utu|o|||eng\d
  =042  \\$adc
  =090  \\$aPQ
  =092  \0$aeBooks
  =100  1\$aDumas, Alexandre,$d1802-1870.
  =245  14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten
  Years Later being the completion of The Three Musketeers And Twenty
  Years After /$Alexandre Dumas.
  =260  \\$aSalt Lake City :$bProject Gutenberg Literary Archive
  Foundation,$c2006.
  =300  \\$a1 online resource :$bmultiple file formats.
  =500  \\$aRecords generated from Project Gutenberg RDF data.
  =540  \\$aApplicable license:$uhttp://www.gutenberg.org/license
  =650  \0$aAdventure stories.
  =650  \0$aHistorical fiction.
  =651  \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction.
  =655  \0$aElectronic books.
  =710  2\$aProject Gutenberg.
  =856  40$uhttp://www.gutenberg.org/etext/18997$zClick to access.
 
  thanks for your interest..
 
  regards,
  dana
 
 
  On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote:
 
   Hi Dana,
  
   Out of curiosity, how does your crosswalk differ from Project
   Gutenberg's MARC files?  See, e.g.:
  
  
  
  http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_
   .28automatically_generated.29
  
   Yours,
   Kevin
  
   --
   Kevin Ford
   Network Development and MARC Standards Office Library of Congress
   Washington, DC
  
  
  
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
  Behalf
Of Dana Pearson
Sent: Tuesday, June 11, 2013 9:24 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] best way to make MARC files available to anyone
   
I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.
  I
would like to make these files available to any library that is
interested.
   
I thought that I would put them on my website via FTP but don't
  know
if that is the best way.  Don't have an ftp client myself so was
thinking that that may be now passé.
   
I tried using Google Drive with access available via the link to
  two
versions of the files, UTF8 and MARC8.  However, it seems that that
is not a viable solution.  I can access the files with the URLs
provided by setting the access to anyone with the URL but doesn't
work for some of those testing it for me or with the links I have
  on my webpage..
   
I have five folders with files of about 38 MB total.  I have
separated the ebooks, audio books, juvenile

Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Dana Pearson
Thanks for the replies..I had looked at GitHub but thought it something
different, ie, collaborative software development...I will look again

hadn't thought of the Internet archive but that might be good and I'll take
a look at dropbox and Eric's other suggestions...altogether new to the
'cloud'

and regarding MARC records on the Gutenberg Project page...there is a new
feature that converts RDF/DC to MARC  but the download was small so I
suspect only recent additions...in fact, the necessary editing would remain
but may be useful for keeping my work up to date...I'll be interested to
see how it handles new line feeds in dc:title elements.

thanks again for the suggestions including Cary's that comes in as I type
this

dana




On Wed, Jun 12, 2013 at 6:09 AM, Ross Singer rossfsin...@gmail.com wrote:

 Or the Internet Archive, since there are also a whole bunch of other MARC
 dumps there.

 -Ross.

 On Jun 12, 2013, at 4:25 AM, Owen Stephens o...@ostephens.com wrote:

  Putting the files on GitHub might be an option - free for public
 repositories, and 38Mb should not be a problem to host there
 
  Owen
 
  Owen Stephens
  Owen Stephens Consulting
  Web: http://www.ostephens.com
  Email: o...@ostephens.com
  Telephone: 0121 288 6936
 
  On 12 Jun 2013, at 02:24, Dana Pearson dbpearsonm...@gmail.com wrote:
 
  I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
 would
  like to make these files available to any library that is interested.
 
  I thought that I would put them on my website via FTP but don't know if
  that is the best way.  Don't have an ftp client myself so was thinking
 that
  that may be now passé.
 
  I tried using Google Drive with access available via the link to two
  versions of the files, UTF8 and MARC8.  However, it seems that that is
 not
  a viable solution.  I can access the files with the URLs provided by
  setting the access to anyone with the URL but doesn't work for some of
  those testing it for me or with the links I have on my webpage..
 
  I have five folders with files of about 38 MB total.  I have separated
 the
  ebooks, audio books, juvenile content, miscellaneous and non-Latin
 scripts
  such as Chinese, Modern Greek.  Most of the content is in the ebook
 folder.
 
  I would like to make access as easy as possible.
 
  Google Drive seems to work for me.  Here's the link to my page with the
  links in case you would like to look at the folders.  Works for me but
 not
  for everyone who's tried it.
 
  http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
  thanks,
  dana
 
  --
  Dana Pearson
  dbpearsonmlis.com




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Dana Pearson
Kevin,

don't know yet since don't know how to unzip the file...bz2?...in any case,
I'm guessing that there is no post transformation editing that most
libraries would insist upon...eg, subject headings in the metadata are
strings with hyphens separating subjects from subheadings and spatial,
temporal, genre subfields have to be introduced...some content needs to go
into 600,610, 611,630,651 fields...for more on the post transform editing
see:

http://dbpearsonmlis.com/GPmetadata.html

dana


On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote:

 Hi Dana,

 Out of curiosity, how does your crosswalk differ from Project Gutenberg's
 MARC files?  See, e.g.:


 http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29

 Yours,
 Kevin

 --
 Kevin Ford
 Network Development and MARC Standards Office
 Library of Congress
 Washington, DC



  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Dana Pearson
  Sent: Tuesday, June 11, 2013 9:24 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] best way to make MARC files available to anyone
 
  I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
  would like to make these files available to any library that is
  interested.
 
  I thought that I would put them on my website via FTP but don't know if
  that is the best way.  Don't have an ftp client myself so was thinking
  that that may be now passé.
 
  I tried using Google Drive with access available via the link to two
  versions of the files, UTF8 and MARC8.  However, it seems that that is
  not a viable solution.  I can access the files with the URLs provided
  by setting the access to anyone with the URL but doesn't work for some
  of those testing it for me or with the links I have on my webpage..
 
  I have five folders with files of about 38 MB total.  I have separated
  the ebooks, audio books, juvenile content, miscellaneous and non-Latin
  scripts such as Chinese, Modern Greek.  Most of the content is in the
  ebook folder.
 
  I would like to make access as easy as possible.
 
  Google Drive seems to work for me.  Here's the link to my page with the
  links in case you would like to look at the folders.  Works for me but
  not for everyone who's tried it.
 
  http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
  thanks,
  dana
 
  --
  Dana Pearson
  dbpearsonmlis.com




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Dana Pearson
Kevin, Eric

7zip worked fine to unzip and records look pretty good since they used 653
and preserved the string from the metadata element with the hypens.
 However the records do not do subfield d in 100 or 700 fields and thus
such content appears in the 245$c.  245$a seems to go missing with some
frequency.  MarcEdit does not report any errors though.

My original intent was just to keep my XSLT skills sharp while I had some
free time last August.  After creating the stylesheet, I then had no free
time until January when I could devote 2 or 3 hours to the post transform
editing.  Thought I'd just dive in but the pool was much deeper than I had
anticipated.

Do think libraries will prefer my edited versions although different in
non-access points as well.  Incidentally, not many additions since my
harvest.

First record in the Project Gutenberg produced records:

=LDR  00721cam a22002293a 4500
=001  27384
=003  PGUSA
=008  081202s2008xxu|s|000\|\eng\d
=040  \\$aPGUSA$beng
=042  \\$adc
=050  \4$aPQ
=100  1\$aDumas, Alexandre, 1802-1870
=245  10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas
=260  \\$bProject Gutenberg,$c2008
=500  \\$aProject Gutenberg
=506  \\$aFreely available.
=516  \\$aElectronic text
=653  \0$aFrance -- History -- Regency, 1715-1723 -- Fiction
=653  \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction
=830  \0$aProject Gutenberg$v27384
=856  40$uhttp://www.gutenberg.org/etext/27384
=856  42$uhttp://www.gutenberg.org/license$3Rights

couldn't readily find the above item but here's an example of my records by
the same author.

=LDR  01002nam a22002535  4500
=001  PG18997
=006  md
=007  cr||n\|||muaua
=008  \\s2006utu|o|||eng\d
=042  \\$adc
=090  \\$aPQ
=092  \0$aeBooks
=100  1\$aDumas, Alexandre,$d1802-1870.
=245  14$aThe Vicomte de Bragelonne$h[electronic resource] :$bOr Ten Years
Later being the completion of The Three Musketeers And Twenty Years
After /$Alexandre Dumas.
=260  \\$aSalt Lake City :$bProject Gutenberg Literary Archive
Foundation,$c2006.
=300  \\$a1 online resource :$bmultiple file formats.
=500  \\$aRecords generated from Project Gutenberg RDF data.
=540  \\$aApplicable license:$uhttp://www.gutenberg.org/license
=650  \0$aAdventure stories.
=650  \0$aHistorical fiction.
=651  \0$aFrance$vHistory$yLouis XIV, 1643-1715$vFiction.
=655  \0$aElectronic books.
=710  2\$aProject Gutenberg.
=856  40$uhttp://www.gutenberg.org/etext/18997$zClick to access.

thanks for your interest..

regards,
dana


On Wed, Jun 12, 2013 at 9:10 AM, Ford, Kevin k...@loc.gov wrote:

 Hi Dana,

 Out of curiosity, how does your crosswalk differ from Project Gutenberg's
 MARC files?  See, e.g.:


 http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs#MARC_Records_.28automatically_generated.29

 Yours,
 Kevin

 --
 Kevin Ford
 Network Development and MARC Standards Office
 Library of Congress
 Washington, DC



  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Dana Pearson
  Sent: Tuesday, June 11, 2013 9:24 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] best way to make MARC files available to anyone
 
  I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I
  would like to make these files available to any library that is
  interested.
 
  I thought that I would put them on my website via FTP but don't know if
  that is the best way.  Don't have an ftp client myself so was thinking
  that that may be now passé.
 
  I tried using Google Drive with access available via the link to two
  versions of the files, UTF8 and MARC8.  However, it seems that that is
  not a viable solution.  I can access the files with the URLs provided
  by setting the access to anyone with the URL but doesn't work for some
  of those testing it for me or with the links I have on my webpage..
 
  I have five folders with files of about 38 MB total.  I have separated
  the ebooks, audio books, juvenile content, miscellaneous and non-Latin
  scripts such as Chinese, Modern Greek.  Most of the content is in the
  ebook folder.
 
  I would like to make access as easy as possible.
 
  Google Drive seems to work for me.  Here's the link to my page with the
  links in case you would like to look at the folders.  Works for me but
  not for everyone who's tried it.
 
  http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
 
  thanks,
  dana
 
  --
  Dana Pearson
  dbpearsonmlis.com




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Dana Pearson
quick followup on the thread..

github:  I looked at the cooperhewitt collection but don't see a way to
download the content...I could copy and paste their content but that may
not be the best approach for my files...documentation is thin, seems i
would have to provide email addresses for those seeking access...but
clearly that is not the case with how the cooperhewitt archive is
configured..

My primary concern has been to make it as simple a process as possible for
libraries which have limited technical expertise.  One of the reasons I
made a career change was my inability as a library director to integrate
very useful online resources in the library's content discovery system.
 Each of the libraries I led lacked expertise and/or the technical support
necessary to do so.  So, quit my job, re-tooled and now working
independently.

Internet Archive:  I did a search that included a query term MARC and found
the Open Library and this may be the best option but I will have to include
a field in each record I think...something I could easilydo...the marc
records do download nicely...I'll send a message for guidance on this

Eric's suggestion regarding MIME type is interesting as well but seems I
would have to have a recognizable type like zip...would prefer to have the
files no larger than 4000 or so records to facilitate processing...there
are also some content libraries may not want...eg, erotic literature,
juvenile content..

found the file for comparison with GP generated MARC:

=LDR  00945nam a22002535  4500
=001  PG27384
=006  md
=007  cr||n\|||muaua
=008  \\s2008utu|o|||eng\d
=042  \\$adc
=090  \\$aPQ
=092  \0$aeBooks
=100  1\$aDumas, Alexandre,$d1802-1870.
=240  14$aUne fille du régent.$lEnglish
=245  14$aThe Regent's Daughter$h[electronic resource] /$cAlexandre Dumas.
=260  \\$aSalt Lake City :$bProject Gutenberg Literary Archive
Foundation,$c2008.
=300  \\$a1 online resource :$bmultiple file formats.
=500  \\$aRecords generated from Project Gutenberg RDF data.
=540  \\$aApplicable license:$uhttp://www.gutenberg.org/license
=600  10$aOrléans, Philippe,$cduc d',$d1674-1723$vFiction.
=651  \0$aFrance$xHistory$yRegency, 1715-1723$vFiction.
=655  \0$aElectronic books.
=710  2\$aProject Gutenberg.
=856  40$uhttp://www.gutenberg.org/etext/27384$zClick to access.

Gutenberg Project MARC:

=LDR  00721cam a22002293a 4500
=001  27384
=003  PGUSA
=008  081202s2008xxu|s|000\|\eng\d
=040  \\$aPGUSA$beng
=042  \\$adc
=050  \4$aPQ
=100  1\$aDumas, Alexandre, 1802-1870
=245  10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas
=260  \\$bProject Gutenberg,$c2008
=500  \\$aProject Gutenberg
=506  \\$aFreely available.
=516  \\$aElectronic text
=653  \0$aFrance -- History -- Regency, 1715-1723 -- Fiction
=653  \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction
=830  \0$aProject Gutenberg$v27384
=856  40$uhttp://www.gutenberg.org/etext/27384
=856  42$uhttp://www.gutenberg.org/license$3Rights

thanks again,
dana


On Wed, Jun 12, 2013 at 6:19 PM, Dana Pearson dbpearsonm...@gmail.comwrote:

 Kevin, Eric

 7zip worked fine to unzip and records look pretty good since they used 653
 and preserved the string from the metadata element with the hypens.
  However the records do not do subfield d in 100 or 700 fields and thus
 such content appears in the 245$c.  245$a seems to go missing with some
 frequency.  MarcEdit does not report any errors though.

 My original intent was just to keep my XSLT skills sharp while I had some
 free time last August.  After creating the stylesheet, I then had no free
 time until January when I could devote 2 or 3 hours to the post transform
 editing.  Thought I'd just dive in but the pool was much deeper than I had
 anticipated.

 Do think libraries will prefer my edited versions although different in
 non-access points as well.  Incidentally, not many additions since my
 harvest.

 First record in the Project Gutenberg produced records:

 =LDR  00721cam a22002293a 4500
 =001  27384
 =003  PGUSA
 =008  081202s2008xxu|s|000\|\eng\d
 =040  \\$aPGUSA$beng
 =042  \\$adc
 =050  \4$aPQ
 =100  1\$aDumas, Alexandre, 1802-1870
 =245  10$a$h[electronic resource] /$cby Alexandre, 1802-1870 Dumas
 =260  \\$bProject Gutenberg,$c2008
 =500  \\$aProject Gutenberg
 =506  \\$aFreely available.
 =516  \\$aElectronic text
 =653  \0$aFrance -- History -- Regency, 1715-1723 -- Fiction
 =653  \0$aOrléans, Philippe, duc d', 1674-1723 -- Fiction
 =830  \0$aProject Gutenberg$v27384
 =856  40$uhttp://www.gutenberg.org/etext/27384
 =856  42$uhttp://www.gutenberg.org/license$3Rights

 couldn't readily find the above item but here's an example of my records
 by the same author.

 =LDR  01002nam a22002535  4500
 =001  PG18997
 =006  md
 =007  cr||n\|||muaua
 =008  \\s2006utu|o|||eng\d
 =042  \\$adc
 =090  \\$aPQ
 =092  \0$aeBooks
 =100  1\$aDumas, Alexandre,$d1802-1870.
 =245  14$aThe Vicomte de Bragelonne$h[electronic

[CODE4LIB] best way to make MARC files available to anyone

2013-06-11 Thread Dana Pearson
I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I would
like to make these files available to any library that is interested.

I thought that I would put them on my website via FTP but don't know if
that is the best way.  Don't have an ftp client myself so was thinking that
that may be now passé.

I tried using Google Drive with access available via the link to two
versions of the files, UTF8 and MARC8.  However, it seems that that is not
a viable solution.  I can access the files with the URLs provided by
setting the access to anyone with the URL but doesn't work for some of
those testing it for me or with the links I have on my webpage..

I have five folders with files of about 38 MB total.  I have separated the
ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts
such as Chinese, Modern Greek.  Most of the content is in the ebook folder.

I would like to make access as easy as possible.

Google Drive seems to work for me.  Here's the link to my page with the
links in case you would like to look at the folders.  Works for me but not
for everyone who's tried it.

http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html

thanks,
dana

-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] LOC Subject Headings API

2013-06-04 Thread Dana Pearson
Joshua,

There are different formats at LOC:

http://id.loc.gov/authorities/subjects.html

dana


On Tue, Jun 4, 2013 at 6:31 PM, Joshua Welker jwel...@sbuniv.edu wrote:

 I am building an auto-suggest feature into our library's search box, and I
 am wanting to include LOC subject headings in my suggestions list. Does
 anyone know of any web service that allows for automated harvesting of LOC
 Subject Headings? I am also looking for name authorities, for that matter.
 Any format will be acceptable to me: RDF, XML, JSON, HTML, CSV... I have
 spent a while Googling with no luck, but this seems like the sort of
 general-purpose thing that a lot of people would be interested in. I feel
 like I must be missing something. Any help is appreciated.

 Josh Welker
 Electronic/Media Services Librarian
 College Liaison
 University Libraries
 Southwest Baptist University
 417.328.1624




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Dana Pearson
i'm not a coder but i undertook a study of XML some years after it
came onto the scene and with a likely confused notion that it would be
the next significant technology, I learned some XSL and later was able
to weave PubMed Central journal information (CSV transformed into XML)
together with Dublin Core metadata of journal articles into MARCXML
during harvest with MarcEdit (which the inestimable Terry Reece
continues to tweak).  Also used the same XML journal data to augment
NLM  journal records with PubMed Central holdings and other data with
a transform in my IDE though it took me weeks to get right..so, no
asperations to become a coder.

Probably did not get all of the MARC cataloging rules right and I can
empathize with those who come to MARC and cataloging standards without
cataloging training, experience. My library experience was primarily
as library director...my expertise on library specializations would
always be under question.

regards,
dana








-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] newbie

2010-03-24 Thread Dana Pearson
I've been focusing on XSL and XQuery, but Python's on my list to do
although I want to do a turn in Perl first, very versatile.
Just a javascript background.

regards,
dana

On Wed, Mar 24, 2010 at 2:24 PM, jenny jennynotanyd...@gmail.com wrote:
 A newly-minted library school grad who has up to this point focused my
 studies on Rare Books and Book Arts, I've been interested in getting
 back into some programming--I took two classes in college
 (VisualBASIC), have a smattering of web design and php, MySQL,
 exposure, but I'd like to try my hand at teaching myself a language in
 my free time. My partner is a former dotcom programmer (now studying
 neuroscience) and has offered to assist when needed, so I'm not
 completely on my own (thank goodness).

 My question is, where would you recommend I would begin? What's hot
 right now in the library world? Python, PERL, Ruby? Any advice you'd
 have for a beginner like me or even recommendations for online courses
 would be extremely appreciated

 JC




-- 
Dana Pearson
dbpearsonmlis.com


Re: [CODE4LIB] exploiting z39.50

2009-05-09 Thread Dana Pearson
On 5/8/09, Xiaoming Liu xiaoming@gmail.com wrote:
 On Fri, May 8, 2009 at 3:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 I wonder how xID handles superceded OCLCnums, if it'll still succesfully
 find the right matches for you?


 This is documented in
 http://xisbn.worldcat.org/xisbnadmin/xoclcnum/api.htm#deleted


 Worldcat uses OCLC Control Number Cross-Reference to track deleted OCLC
 numbers. When an OCLC number is deleted, it's still search-able from this
 service. In the response, we use presentOclcnum to specify present OCLC
 number. For example 2416076 was merged into 24991049, a request of the
 deleted number 2416076 will return:

   rsp xmlns=http://worldcat.org/xid/xoclcnum/; stat=ok
   oclcnum lccn=34025476 presentOclcnum=249910492416076/oclcnum
   /rsp


 The presentOclcnum field is omitted when an OCLC number is active, so
 request to current OCLC number 24991049 returns:

   rsp xmlns=http://worldcat.org/xid/xoclcnum/; stat=ok
   oclcnum lccn=34025476 24991049/oclcnum
   /rsp


 Xiaoming





 Ray Denenberg, Library of Congress wrote:

 From: Eric Lease Morgan emor...@nd.edu


  1. What MARC field/subfield might I put this string?
  2. How would I go about getting the string indexed?
  3. How might I go about querying the server for records with this
 string?



 I can at least talk about the third question.  There was work on a marc
 attribute set, though not completed.  If you look at the oid register at
 http://www.loc.gov/z3950/agency/defns/oids.html you'll see that the
 latest work on it (second draft) was in 2000,
 http://www.nlc-bnc.ca/iso/z3950/MARC_attribute_set_2.doc. So if someone
 actually wanted to put it to use it would have to be completed.

 For SRU there is a complete marc context set,
 http://www.loc.gov/standards/sru/resources/marc-context-set.html.

 --Ray







Re: [CODE4LIB] MARC-XML - Qualified Dublin Core XSLT

2009-03-06 Thread Dana Pearson
try:

http://imlsdcc.grainger.uiuc.edu/docs/stylesheets/GeneralMARCtoQDC.xsl

I searched the file title (not complete path) in Google.

regards,

Dana Pearson

On Fri, Mar 6, 2009 at 2:03 PM, Walker, David dwal...@calstate.edu wrote:

 Hi All,

 Anyone have an XSLT style sheet to convert from MARC-XML to Qualified
 Dublin Core?

 I'm looking to load these into DSpace, if that makes a difference.  Looks
 like LOC only has MARC-XML to Simple Dublin Core.  This page [1] mentions a
  'MARCXML to Qualified DC styles heets' developed at the University of
 Illinois, but the links are dead.

 --Dave

 [1] http://cicharvest.grainger.uiuc.edu/schemas.asp

 ==
 David Walker
 Library Web Services Manager
 California State University
 http://xerxes.calstate.edu