Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
Thank you Roy and Simon for the info. As for your second point, I suppose one advantage of using the WorldCat API at this experimental stage is that the returned bib records are already FRBR-ized. Ross - Thanks for the link of Open Library data dump. WorldCat collection is 2 orders of magnitude larger than open library which makes a significant difference considering the skewness and sparsity of bib records classified according to library taxonomies, e.g., DDC, LCC (for more info, see: http://cdm15003.contentdm.oclc.org/cdm/singleitem/collection/p267701coll 27/id/277/rec/28) Thanks, Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero Sent: 22 May 2012 19:47 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set Arash - you might not want to use a straight dump of worldcat catalog records- at least not without the associated holdings information.* There are a lot of quasi-duplicate records that are sufficiently broken that the worldcat de-duplication algorithm refuses to merge them. These records will usually only be used by a handful of institutions; the better records will tend to have more associated holdings. The holdings count should be used to weight the strength of association between class numbers and features. Also, since classification/categorization is something that is usually considered to be a property of works, rather than manifestations, one might get better results by using Work sets for training. I would suggest, er, contacting Thom Hickey. Simon * Well, not precisely holdings - you just need the number of distinct institutions with at least one copy. I call them 'hasings'. On Sat, May 19, 2012 at 8:42 PM, Roy Tennant wrote: > Arash, > Yes, we have made WorldCat available to researchers under a special > license agreement. I suggest contacting Thom Hickey > about such an arrangement. Thanks, > Roy > > On Fri, May 18, 2012 at 3:46 AM, Arash.Joorabchi > wrote: > > Dear Karen, > > > > I am conducting a research experiment on automatic text classification > and I am trying to retrieve top matching bib records (which include DDC > fields) for a set of keyphrases extracted from a given document. So, I > suppose this is a rather exceptional use case. In fact, the right approach > for this experiment is to process the full dump of WorldCat database > directly rather than sending a limited number of queries via the API. > > > > I read here: > > http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ > > that WorldCat might become available as open linked data in future, > which would solve my problem and help similar text mining projects. > However, I wonder if it is currently available to researchers under a > research/non-commercial use license agreement. > > > > Regards, > > Arash > > > > -Original Message- > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Karen Coombs > > Sent: 17 May 2012 08:37 > > To: CODE4LIB@LISTSERV.ND.EDU > > Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records > without a DDC no from the result set > > > > I forwarded this thread to the Product Manager for the WorldCat Search > > API. She responded back that unfortunately this query is not possible > > using the API at this time. > > > > FYI, the SRU interface to WorldCat Search API doesn't currently > > support any scan type searches either. > > > > Is there a particular use case you're trying to support? Know that > > would help us document this as a possible enhancement. > > > > Karen > > > > Karen Coombs > > Senior Product Analyst > > Web Services > > OCLC > > coom...@oclc.org > > > > On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi > wrote: > >> Hi Andy, > >> > >> > >> > >> I am a SRU newbie myself, so I don't know how this could be achieved > >> using scan operations and could not find much info on SRU website > >> (http://www.loc.gov/standards/sru/). > >> > >> As for the wildcards, according to this guide: > >> > http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea > >> rchworldcatquickreference.pdf the symbols should be preceded by at least > >> 3 characters, and therefore clauses like: > >> > >> > >> > >> ... AND srw.dd=* > >> > >> ... AND srw.dd=?.* > >> > >> ... AND srw/dd=###.* > >> > >> ... AND srw/dd=?3.* > >> > >> > >> > >> > >> > >> do not work and result in the following error: > >> > >> Diagnostics > >> > >> Identifier: > >> > >> info:srw/diagnostic/1/9 > >> > >> Meaning: > >> > >> > >> > >> Details: > >> > >> > >> > >> Message: > >> > >> Not enough chars in truncated term:Truncated words too short(9) > >> > >> > >> > >> > >> > >> Thanks, > >> > >> Arash > >> > >> > >> > >> > >> > >> From: Houghton,Andrew [mailto:hough...@oclc.org] > >> Sent: 16 May 2012 11:58 > >> To: Arash.Joorabchi > >> Subject: Re: [CODE4LIB] Worl
Re: [CODE4LIB] archiving a wiki
Yup, something like that! But for JSPwiki :) JSPwiki has an extension to export as PDF, but it doesn't do multiple pages without some extra work each time. We're hoping to find something quick and automated so we can archive quickly and move on! >>> Dave Caroline 5/22/2012 4:08 PM >>> On Tue, May 22, 2012 at 10:04 PM, Carol Hassler wrote: > My organization would like to archive/export our internal wiki in some > kind of end-user friendly format. The concept is to copy the wiki > contents annually to a format that can be used on any standard computer > in case of an emergency (i.e. saved as an HTML web-style archive, saved > as PDF files, saved as Word files). something like ? http://www.mediawiki.org/wiki/Extension:DumpHTML Dave Caroline
Re: [CODE4LIB] archiving a wiki
On Tue, May 22, 2012 at 10:04 PM, Carol Hassler wrote: > My organization would like to archive/export our internal wiki in some > kind of end-user friendly format. The concept is to copy the wiki > contents annually to a format that can be used on any standard computer > in case of an emergency (i.e. saved as an HTML web-style archive, saved > as PDF files, saved as Word files). something like ? http://www.mediawiki.org/wiki/Extension:DumpHTML Dave Caroline
[CODE4LIB] archiving a wiki
My organization would like to archive/export our internal wiki in some kind of end-user friendly format. The concept is to copy the wiki contents annually to a format that can be used on any standard computer in case of an emergency (i.e. saved as an HTML web-style archive, saved as PDF files, saved as Word files). Another way to put it is that we are looking for a way to export the contents of the wiki into a printer-friendly format - to a document that maintains some organization and formatting and can be used on any standard computer. Is anybody aware of a tool out there that would allow for this sort of automated, multi-page export? Our wiki is large and we would prefer not to do this type of backup one page at a time. We are using JSPwiki, but I'm open to any option you think might work. Could any of the web harvesting products be adapted to do the job? Has anyone else backed up a wiki to an alternate format? Thanks! Carol Hassler Webmaster / Cataloger Wisconsin State Law Library (608) 261-7558 http://wilawlibrary.gov/
[CODE4LIB] Job: NC LIVE Web & User Experience Development Librarian at NC LIVE
NC LIVE Web & User Experience Development Librarian Vacancy Announcement NC LIVE seeks candidates with a passion for helping people find, use, and create information that improves their communities and their lives through digital library collections and services. We need your enthusiasm, ideas, and unique skills to expand the digital possibilities of the state's academic and public libraries. If you are looking to join an organization with a track record of success that will use your ideas to help build the next generation of libraries in North Carolina, come join us at NC LIVE. Known for its leadership in collaborative online library success, NC LIVE seeks an innovative, curious, and flexible colleague to join a seven-member team of librarians and information technology professionals serving the state from NCSU Libraries on the campus of NC State University. This newly-created position reports to the NC LIVE Executive Director. Responsibilities The Web/UXD Librarian will have primary responsibility to design, develop, and maintain the web and mobile interfaces of NC LIVE's digital library services and collections. Web and Application Development Provide hands on leadership and vision in the development, support, integration, and administration of NC LIVE's digital library websites, portals, and discovery systems Conduct ongoing research into the development of new digital library interface capabilities, enhancements, and user-centered design trends Partner with member library staff to maximize access to and use of NC LIVE's digital collections through creative web design that improves the patron's discovery experience Provide frontline support for digital library services through NC LIVE's content management systems and library relations systems Participate in consortial planning, by serving on committees, taskforces, and teams Member Relations and Outreach Support Track and identify trends in use and user behavior to assist colleagues in building orientation, awareness, and training initiatives for member libraries Provide digital library consulting services to member libraries Build relationships with member librarians and digital library service vendors to ensure the best match of service to organizational needs Qualifications Required: ALA-accredited MLS, or equivalent degree in library or information science Relevant experience, including design and development of digital applications library services in an public, academic, or special library environment Knowledge of and experience with current and emerging web development technologies as they contribute to digital library services and user experience of students, library patrons, or researchers Demonstrated commitment to creative, high-quality digital library services Evidence of ability for ongoing professional development and contribution Knowledge of data standards prevalent in libraries Relevant customer service experience in a library, educational institution, or other knowledge-based organization Ability to work and excel in both individual and team environments Valid driver's license Preferred: Previous digital library development experience in a library, educational institution, or other knowledge-based organization Experience with search engine technologies in a library or university environment Experience addressing usability issues and with user-centered design in library environments Demonstrated experience retrieving data from open web API's Overview of NC LIVE NC LIVE is North Carolina's statewide online library service. Founded in 1997 by representatives from the NC Community Colleges, the NC Independent Colleges and Universities, the NC Public Library Directors Association, the University of North Carolina and the State Library of North Carolina, NC LIVE serves nearly 200 member libraries across North Carolina, and is dedicated to helping its member libraries provide North Carolinians with resources that support education, enhance statewide economic development, and increase quality of life. Designed for at-home use, NC LIVE eBooks, magazines, newspapers, journals, media, and other online materials are available from any Internet connection via library websites, and through www.nclive.org. NC LIVE offers free electronic access to resources for all ages on topics ranging from careers, business, and investing, to auto repair, health, history, and genealogy. NC LIVE resources are available to all North Carolinians through their local public, community college, or academic library. More information about NC LIVE can be found at: http://www.nclive.org/about Salary and Benefits Salary is very competitive, commensurate with education and experience. Position is non-tenure track faculty at the rank of Librarian. Benefits include: 24 days vacation, 12 days sick leave; State of NC comprehensive major medical insurance, and state,
Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
Arash - you might not want to use a straight dump of worldcat catalog records- at least not without the associated holdings information.* There are a lot of quasi-duplicate records that are sufficiently broken that the worldcat de-duplication algorithm refuses to merge them. These records will usually only be used by a handful of institutions; the better records will tend to have more associated holdings. The holdings count should be used to weight the strength of association between class numbers and features. Also, since classification/categorization is something that is usually considered to be a property of works, rather than manifestations, one might get better results by using Work sets for training. I would suggest, er, contacting Thom Hickey. Simon * Well, not precisely holdings - you just need the number of distinct institutions with at least one copy. I call them 'hasings'. On Sat, May 19, 2012 at 8:42 PM, Roy Tennant wrote: > Arash, > Yes, we have made WorldCat available to researchers under a special > license agreement. I suggest contacting Thom Hickey > about such an arrangement. Thanks, > Roy > > On Fri, May 18, 2012 at 3:46 AM, Arash.Joorabchi > wrote: > > Dear Karen, > > > > I am conducting a research experiment on automatic text classification > and I am trying to retrieve top matching bib records (which include DDC > fields) for a set of keyphrases extracted from a given document. So, I > suppose this is a rather exceptional use case. In fact, the right approach > for this experiment is to process the full dump of WorldCat database > directly rather than sending a limited number of queries via the API. > > > > I read here: > > http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ > > that WorldCat might become available as open linked data in future, > which would solve my problem and help similar text mining projects. > However, I wonder if it is currently available to researchers under a > research/non-commercial use license agreement. > > > > Regards, > > Arash > > > > -Original Message- > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Karen Coombs > > Sent: 17 May 2012 08:37 > > To: CODE4LIB@LISTSERV.ND.EDU > > Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records > without a DDC no from the result set > > > > I forwarded this thread to the Product Manager for the WorldCat Search > > API. She responded back that unfortunately this query is not possible > > using the API at this time. > > > > FYI, the SRU interface to WorldCat Search API doesn't currently > > support any scan type searches either. > > > > Is there a particular use case you're trying to support? Know that > > would help us document this as a possible enhancement. > > > > Karen > > > > Karen Coombs > > Senior Product Analyst > > Web Services > > OCLC > > coom...@oclc.org > > > > On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi > wrote: > >> Hi Andy, > >> > >> > >> > >> I am a SRU newbie myself, so I don't know how this could be achieved > >> using scan operations and could not find much info on SRU website > >> (http://www.loc.gov/standards/sru/). > >> > >> As for the wildcards, according to this guide: > >> > http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea > >> rchworldcatquickreference.pdf the symbols should be preceded by at least > >> 3 characters, and therefore clauses like: > >> > >> > >> > >> ... AND srw.dd=* > >> > >> ... AND srw.dd=?.* > >> > >> ... AND srw/dd=###.* > >> > >> ... AND srw/dd=?3.* > >> > >> > >> > >> > >> > >> do not work and result in the following error: > >> > >> Diagnostics > >> > >> Identifier: > >> > >> info:srw/diagnostic/1/9 > >> > >> Meaning: > >> > >> > >> > >> Details: > >> > >> > >> > >> Message: > >> > >> Not enough chars in truncated term:Truncated words too short(9) > >> > >> > >> > >> > >> > >> Thanks, > >> > >> Arash > >> > >> > >> > >> > >> > >> From: Houghton,Andrew [mailto:hough...@oclc.org] > >> Sent: 16 May 2012 11:58 > >> To: Arash.Joorabchi > >> Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records > >> without a DDC no from the result set > >> > >> > >> > >> I'm not an SRU guru, but is it possible to do a scan and look for a > >> postings of zero? > >> > >> > >> > >> Andy. > >> > >> On May 16, 2012, at 6:39, "Arash.Joorabchi" > >> wrote: > >> > >>Hi mark, > >> > >>Srw.dd=* does not work either: > >> > >>Identifier: info:srw/diagnostic/1/27 > >>Meaning: > >>Details:srw.dd > >>Message:The index [srw.dd] did not include a searchable > >> value > >> > >>I suppose the only option left is to retrieve everything and > >> filter the results on the client side. > >> > >>Thanks for your quick reply. > >>Arash > >> > >> > >>-Original Message- > >>From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On >
[CODE4LIB] FW: Job Posting (Library Technician) South Bay (Los Angeles County)
Apologies for the cross postings . . . LAC Group seeks a Library Technician for a part-time temporary 3-month position at a corporate library in the South Bay (Los Angeles County). This position reports to the library's Technical Services Manager. Responsibilities: * Add online access to company reports in the library catalog to the company's Knowledge Management System * Load, define metadata, and add links to newly scanned internally-generated technical reports * File maintenance in the digital library * Descriptive cataloging * Database clean-up Qualifications: * Previous technical library experience * Excellent attention to detail * Experience using an integrated library system * Experience using a corporate document management system To apply, please visit http://goo.gl/wVK4v LAC Group is an Equal Opportunity / Affirmative Action Employer who values diversity in the workplace.