Re: [CODE4LIB] Radioactive records for Solr
On Feb 9, 2007, at 3:58 AM, Rob Styles wrote: Here's the set that I generated a while ago - it's quite big as it covers the full Marc21 field and subfield set for bibliographic records. I'm releasing these under the terms of our Talis Community License. (http://www.talis.com/tdn/tcl) IANAL, so to clarify this license, would it be ok for me to check this into Solr's repository at Apache (keeping the license file alongside)? I'm not quite sure what we'd do with this data just yet, as it looked like gibberish at first blush, but looking at the document Peter linked to it is by definition supposed to be this way and not overlap with real data. Would people be interested in a write-up of how we've used RadioactiveMarc and automated tests to validate Bath and US National Profile compliance? Absolutely. This would certainly factor into my Solr efforts in crafting more automated tests. Erik
Re: [CODE4LIB] Radioactive records for Solr
Yes, it would be ok to check this in with a copy of the license alongside it - or even just a readme with a link to the license would be fine. The approach is a formulaic one, there are 10 records in there each built up using the same algorithm. You can add these to a store of legitimate marc data with very little chance of them appearing in search results unless searched for specifically. Taking just one field out of one record (in MarcEdit syntax)... =027 \\$ara0271a1r ra0271a2r ra0271a3r$zra0271z1r ra0271z2r ra0271z3r$6ra027161r ra027162r ra027163r$8ra027181r ra027182r ra027183r Each subfield contains 3 tokens (words) for subfield a: ra0271a1r ra0271a2r ra0271a3r 'r' at the start and end of each token is a padding character for testing truncation, 'a' is the record type, '027' the field, '1' the occurrence of that field, 'a' the subfield code, '1', '2', '3' the occurrence of the token in the subfield. This allows all of the truncation, word, phrase, completeness and position combinations to be tested separately - with just one record coming back for each. rob Rob Styles Programme Manager, Data Services, Talis tel: +44 (0)870 400 5000 fax: +44 (0)870 400 5001 direct: +44 (0)870 400 5004 mobile: +44 (0)7971 475 257 msn: [EMAIL PROTECTED] irc: irc.freenode.net/mrob,isnick > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > Erik Hatcher > Sent: 09 February 2007 10:18 > To: CODE4LIB@listserv.nd.edu > Subject: Re: [CODE4LIB] Radioactive records for Solr > > On Feb 9, 2007, at 3:58 AM, Rob Styles wrote: > > Here's the set that I generated a while ago - it's quite big as it > > covers the full Marc21 field and subfield set for bibliographic > > records. > > I'm releasing these under the terms of our Talis Community License. > > (http://www.talis.com/tdn/tcl) > > IANAL, so to clarify this license, would it be ok for me to check > this into Solr's repository at Apache (keeping the license file > alongside)? > > I'm not quite sure what we'd do with this data just yet, as it looked > like gibberish at first blush, but looking at the document Peter > linked to it is by definition supposed to be this way and not overlap > with real data. > > > Would people be interested in a write-up of how we've used > > RadioactiveMarc and automated tests to validate Bath and US National > > Profile compliance? > > Absolutely. This would certainly factor into my Solr efforts in > crafting more automated tests. > > Erik The very latest from Talis read the latest news at www.talis.com/news listen to our podcasts www.talis.com/podcasts see us at these events www.talis.com/events join the discussion here www.talis.com/forums join our developer community www.talis.com/tdn and read our blogs www.talis.com/blogs Any views or personal opinions expressed within this email may not be those of Talis Information Ltd. The content of this email message and any files that may be attached are confidential, and for the usage of the intended recipient only. If you are not the intended recipient, then please return this message to the sender and delete it. Any use of this e-mail by an unauthorised recipient is prohibited. Talis Information Ltd is a member of the Talis Group of companies and is registered in England No 3638278 with its registered office at Knights Court, Solihull Parkway, Birmingham Business Park, B37 7YB.
[CODE4LIB] Job Posting: Data Services Librarian, UCSD Libraries
THE UCSD LIBRARIES - University of California, San Diego Data Services Librarian Social Sciences & Humanities Library Preferred appointment level: Assistant Librarian I -Librarian I with an approximate salary range of $40,008-$66,756 The Libraries of the University of California, San Diego (UCSD) seek applications from innovative and user-oriented library professionals to join the enthusiastic staff of the Social Sciences and Humanities Library in the development of data services and collections. The UCSD Libraries are committed to making access to research information for faculty and students as efficient and convenient as technology, innovation, and resources will allow. Responsibilities of the Position Reporting to the Head for Data, Government and Geographic Information Services, assumes a leadership role in planning and developing data services and collections at UCSD. Evaluates, selects, and acquires data products for the Social Science Data Collection (SSDC). Provides consultations and instruction on finding and using data. Works closely with faculty to determine data needs and discovery tools. Actively collaborates with library liaisons to academic departments to develop and provide coordinated services to meet the full range of social science data needs at UCSD. Plans and implements staff training programs related to data. Serves as the liaison to ICPSR and DDI initiatives. Provides general reference assistance at a combined social sciences/ humanities and government documents reference desk. Some evening and weekend reference hours required. Represent the UCSD Libraries at pertinent meetings and conferences. UCSD Librarians are ex! pected to participate in library-wide planning and professional activities appropriate to their position. Appointment at the Librarian rank requires substantial relevant experience and superior qualifications. Required Qualifications Professional degree from a library school or other appropriate degree or equivalent experience in one or more fields relevant to library services. Experience working with numeric data resources, including the ability to install and support relevant software applications. Ability to download and transform data files to meet the various needs of users. Proficient with social sciences statistical packages and software, such as SAS and SPSS. Experience working with users to provide data services support. Ability to develop effective relationships with the library and campus IT departments. Demonstrated knowledge of a wide range of print and electronic reference and bibliographic tools in the social sciences. Experience in collection development activities in an academic library. Experience using government information. Superior organizational, analytical, and communication skills. Candidate must have a strong commitment to excellence in service and be able to work both independently and collaboratively in a complex changing environment. Potential to excel in a dynamic, academically challenging environment. UCSD librarians are expected to participate in library-wide and system-wide planning and governance, and to be professionally active. Desirable Qualifications Undergraduate and/or graduate degree in the social sciences. Experience providing reference service in an academic or research library serving similar clientele. Experience with GIS data and applications For complete details, see http://orpheus.ucsd.edu/fac/SSHLDataLibrarian.htm.. Application consideration begins March 23, 2007 and will continue until the position is filled. Send application letter including a statement of qualifications, a full resume of education and relevant experience, and the names of at least three persons who are knowledgeable about your qualifications for this position to [EMAIL PROTECTED] or to UCSD, Stacey McDermaid - Library Human Resources, 9500 Gilman Drive Dept. 0175-H, La Jolla, CA 92093-0175. Telephone: 858.534.1279; Confidential Fax: 858.534.8634. Equal opportunity/affirmative action employer
Re: [CODE4LIB] Radioactive records for Solr
On Feb 9, 2007, at 2:58 AM, Rob Styles wrote: Would people be interested in a write-up of how we've used RadioactiveMarc and automated tests to validate Bath and US National Profile compliance? zOMG, yes! -Nate Wendt Library UW - Madison
Re: [CODE4LIB] Radioactive records for Solr
> ... the Zinterop records (which are described > in detail in that pdf but aren't available for > download anywhere I could find ... I sent Bill Moen an email asking if the records described in the report were available for download, and if not, could they be made available. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On > Behalf Of Binkley, Peter > Sent: Thursday, February 08, 2007 3:13 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: [CODE4LIB] Radioactive records for Solr > > In hunting for data to help model subject faceting for MARC > records, I've just been looking at Bill Moen's Zinterop > report > (http://www.unt.edu/zinterop/ZInterop2/Documents/ZInterop2Fina > lReport_we > m4Dec2005.pdf). It occurs to me that with all our various > projects working on indexing MARC records in Solr, we should > set up and distribute a set of "radioactive records" to use > in each project to diagnose and compare indexing and querying > behaviour. Probably we could just use the Zinterop records > (which are described in detail in that pdf but aren't > available for download anywhere I could find); but we might > want to enhance them with data suitable for testing our > faceting systems. Not sure what that would mean but I thought > I'd throw it out. > > If you were at Access '05, you heard Bill describe the Z39.50 > testing he was doing with radioactive records: records with > known unique values in all indexed fields, that could be used > for automated testing of Z39.50 search functionality. The > same approach might be very useful as we feel our way towards > a Solr MARC indexing system. > > Has anyone already done something like this? > > Peter > > Peter Binkley > Digital Initiatives Technology Librarian Information > Technology Services 4-30 Cameron Library University of > Alberta Libraries Edmonton, Alberta Canada T6G 2J8 > Phone: (780) 492-3743 > Fax: (780) 492-9243 > e-mail: [EMAIL PROTECTED] >
[CODE4LIB] Very large file uploads, PHP or possibly Perl
I have always depended on the kindness of strange PHP gurus. I am trying to rewrite a perpetually buggy system for uploading large PDF files (up to multiple tens of megabytes) via a web form. File uploads are very simple in PHP, but there's a default maximum file size of 2MB. Following various online hints I've found, I've gone into php.ini and goosed up the memory_limit, post_max_size, and upload_max_size (and restarted Apache), and added an appropriate hidden form input named MAX_FILE_SIZE. The 2MB limit is still in place. Is there something I overlooked? Or, any other suggestions for how to take in a very large file? [My current Perl version has a history of getting incomplete files in a non-negligible percentage of uploads. Weirdness ensues: whenever this happens, the file reliably cuts off at the same point, but the cutoff is not a fixed number of bytes, nor is it related to the size of the file.] -- Thomas Dowling [EMAIL PROTECTED]
Re: [CODE4LIB] Very large file uploads, PHP or possibly Perl
I haven't needed to upload such large files, but I wonder if using the ftp functions in php would bypass this problem: http://us3.php.net/manual/en/ref.ftp.php Andrew On 2/9/07, Thomas Dowling <[EMAIL PROTECTED]> wrote: I have always depended on the kindness of strange PHP gurus. I am trying to rewrite a perpetually buggy system for uploading large PDF files (up to multiple tens of megabytes) via a web form. File uploads are very simple in PHP, but there's a default maximum file size of 2MB. Following various online hints I've found, I've gone into php.ini and goosed up the memory_limit, post_max_size, and upload_max_size (and restarted Apache), and added an appropriate hidden form input named MAX_FILE_SIZE. The 2MB limit is still in place. Is there something I overlooked? Or, any other suggestions for how to take in a very large file? [My current Perl version has a history of getting incomplete files in a non-negligible percentage of uploads. Weirdness ensues: whenever this happens, the file reliably cuts off at the same point, but the cutoff is not a fixed number of bytes, nor is it related to the size of the file.] -- Thomas Dowling [EMAIL PROTECTED] -- Andrew Darby Web Services Librarian Ithaca College Library http://www.ithaca.edu/library/ [EMAIL PROTECTED]
Re: [CODE4LIB] Very large file uploads, PHP or possibly Perl
Pre-apologies if this suggestion is too "duh", but have you confirmed that you updated the correct php.ini file? One way to check is to create a temporary, web-accessible file (phpinfo.php) with the contents: Load that in your browser and check that the path of the php.ini file apache is using matches the file you've updated. I haven't touched php since v4.x, so there may be an easier way to confirm this nowadays. Also, you should be sure to delete that phpinfo.php file when you're done so you're not unnecessarily publishing your server details. --jay On 2/9/07, Thomas Dowling <[EMAIL PROTECTED]> wrote: I have always depended on the kindness of strange PHP gurus. I am trying to rewrite a perpetually buggy system for uploading large PDF files (up to multiple tens of megabytes) via a web form. File uploads are very simple in PHP, but there's a default maximum file size of 2MB. Following various online hints I've found, I've gone into php.ini and goosed up the memory_limit, post_max_size, and upload_max_size (and restarted Apache), and added an appropriate hidden form input named MAX_FILE_SIZE. The 2MB limit is still in place. Is there something I overlooked? Or, any other suggestions for how to take in a very large file? [My current Perl version has a history of getting incomplete files in a non-negligible percentage of uploads. Weirdness ensues: whenever this happens, the file reliably cuts off at the same point, but the cutoff is not a fixed number of bytes, nor is it related to the size of the file.] -- Thomas Dowling [EMAIL PROTECTED]
Re: [CODE4LIB] Radioactive records for Solr
I just saw this message as well as the follow-ups today. I was one of the graduate assistants working on this phase of Dr. Moen's Zinterop project, and I worked quite closely with the Radioactive records and the Radioactive perl module that Mike Taylor from Index Data developed. I also helped write some of the documentation for the test scripts that we developed that utilized that perl module. There had been plans in there somewhere to develop a paper that really outlined how to use the perl module to do in-depth Z39.50 interoperability testing, but I graduated, got a full-time position, and haven't had time to think about it much since. I always thought it had a lot of potential. I would be more than happy to lend any help at all in any capacity that I can. I was always a little bit disappointed that we hadn't gotten to really follow up on the project much, and I'm glad to see that there is some interest in it. I just had lunch with Dr. Moen at the OR 2007 conference in San Antonio at the end of last month, so I'm still on very good terms with him and could perhaps help enlist his help, as well. Please let me know how I can help. Work keeps me busy, but I can find time after hours if I need to. It's been over a year, so it might take me a bit to get back into it, but I'm sure it will come back to me. (I hope that didn't sound too desperate.) :-) Thanks, Jason Thomale Metadata Librarian Texas Tech University Libraries > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > Binkley, Peter > Sent: Thursday, February 08, 2007 3:13 PM > To: CODE4LIB@listserv.nd.edu > Subject: [CODE4LIB] Radioactive records for Solr > > In hunting for data to help model subject faceting for MARC records, > I've just been looking at Bill Moen's Zinterop report > (http://www.unt.edu/zinterop/ZInterop2/Documents/ZInterop2FinalReport_we > m4Dec2005.pdf). It occurs to me that with all our various projects > working on indexing MARC records in Solr, we should set up and > distribute a set of "radioactive records" to use in each project to > diagnose and compare indexing and querying behaviour. Probably we could > just use the Zinterop records (which are described in detail in that pdf > but aren't available for download anywhere I could find); but we might > want to enhance them with data suitable for testing our faceting > systems. Not sure what that would mean but I thought I'd throw it out. > > If you were at Access '05, you heard Bill describe the Z39.50 testing he > was doing with radioactive records: records with known unique values in > all indexed fields, that could be used for automated testing of Z39.50 > search functionality. The same approach might be very useful as we feel > our way towards a Solr MARC indexing system. > > Has anyone already done something like this? > > Peter > > Peter Binkley > Digital Initiatives Technology Librarian > Information Technology Services > 4-30 Cameron Library > University of Alberta Libraries > Edmonton, Alberta > Canada T6G 2J8 > Phone: (780) 492-3743 > Fax: (780) 492-9243 > e-mail: [EMAIL PROTECTED]
Re: [CODE4LIB] Very large file uploads, PHP or possibly Perl
I have done large file uploads in PHP. Make sure you have the following set in php.ini: upload_max_filesize = file_uploads = on post_max_size = Also, you can set these values through the set_ini function in PHP so that it can be per script instead of effective for every script which can allow for a more granular level of control for security reasons, etc. I have never used the form input value, nor should you have to change the memory_limit very much since the file itself is not loaded into memory, just information regarding the file. Andrew Thomas Dowling wrote: I have always depended on the kindness of strange PHP gurus. I am trying to rewrite a perpetually buggy system for uploading large PDF files (up to multiple tens of megabytes) via a web form. File uploads are very simple in PHP, but there's a default maximum file size of 2MB. Following various online hints I've found, I've gone into php.ini and goosed up the memory_limit, post_max_size, and upload_max_size (and restarted Apache), and added an appropriate hidden form input named MAX_FILE_SIZE. The 2MB limit is still in place. Is there something I overlooked? Or, any other suggestions for how to take in a very large file? [My current Perl version has a history of getting incomplete files in a non-negligible percentage of uploads. Weirdness ensues: whenever this happens, the file reliably cuts off at the same point, but the cutoff is not a fixed number of bytes, nor is it related to the size of the file.] -- Thomas Dowling [EMAIL PROTECTED]
Re: [CODE4LIB] Very large file uploads, PHP or possibly Perl
On 2/9/2007 11:50 AM, Jay Luker wrote: > Pre-apologies if this suggestion is too "duh", but have you confirmed > that you updated the correct php.ini file? Bless you! This did the trick, and I can now upload really big files. And I defy anyone to find something too "duh" for me. :-) Yes indeed, as our long-suffering sysadmin has responded to my requests to rebuild PHP with various tweaks, he has created a series of .ini files with PHP version numbers built into their names. Thanks for reminding me that phpinfo would show this. -- Thomas Dowling [EMAIL PROTECTED]
Re: [CODE4LIB] Code4lib blog anthology?
Well jeez, even if you get the selection done you've still got to do the naming contest and the CIP. At Dan's suggestion I'm moving this to code4lib instead of cod4libcon; see quoted message below. Do others agree it's worth trying to get this done in, like, the next two weeks, so as to have physical copies to show at code4lib? I'm willing to act as an editor if others will too. Snipped from the message below is the science bloggers' anthology, done on the same basis and pulled together in three weeks: http://scienceblogs.com/clock/2007/01/the_science_blogging_anthology.php Peter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel Chudnov Sent: Friday, February 09, 2007 12:12 PM To: [EMAIL PROTECTED] Subject: Re: Code4lib blog anthology? On Feb 9, 2007, at 12:15 PM, Binkley, Peter wrote: > It's probably too late for this year, but how about trying to put this > together for next year: pick a few of the best blog entries from the > previous year from the code4lib community, enough to fill a few dozen > pages at least, and release them as a book on lulu.com or one of the > other print-on-demand services. Three thoughts: 1) cool idea! 2) why is it too late? we could still do it for 2006. 3) we could start up a wiki page *today* with nominations for 2007's second annual edition, and anybody could just add a link/title/author name when they read something they like 4) should this conversation happen on code4lib-list instead of the conference list? (No one expects the spanish inquisition!) 5) we would definitely need to identify editors to assemble and finalize. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "code4libcon" group. To post to this group, send email to [EMAIL PROTECTED] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/code4libcon?hl=en -~--~~~~--~~--~--~---
[CODE4LIB] Job Opportunity for web developer
WEB DEVELOPER Location: LIBRARY & INFORMATION CENTER Job #: CEW6177 Hiring Range: $44,330 to $56,500 Education:Bachelor's Degree in Computer Science or a related field or equivalent combination of education and experience. Experience:Four or more years of work related experience. Demonstrated work experience with at least one relational database management system; comprehensive knowledge of web scripting languages (such as PHP, ASP, Perl, etc.); proficiency in HTM; knowledge of Unix, Apache, and other open source technologies, Windows 95/98/2000/ME/XP/NT and MS Office products is strongly preferred. Proficiency with PHP and MySQL; knowledge and experience with Ruby on Rails; experience with Plone and/or other content management system is strongly desired. Familiarity in a University and/or Library setting preferred. Selection process will include a background check. Duties:Develop dynamic web-based applications working closely with designers and other team members. Enable integration of applications with campus and other library initiatives. Manage the Library's Intranet framework; provide support for group interaction and opportunities for library staff to manage their own content. Create efficient and logical interfaces for clients. Recognize system deficiencies and implement effective solutions. Provide backup for general computer support for Systems Department customers as needed. Provide training and support for new technologies. The position requires strong interpersonal communications, such as advising, recommending or counseling, follow-up, exchanging information and troubleshooting; ensure data integrity, and a proactive, customer and detail-oriented disposition. Demonstrated ability to: cooperate with a variety of people and achieve results, prioritize multiple tasks effectively, pro-actively seek opportunities to broaden and deepen knowledge base and proficiencies will be effective in this position. Must be able to work independently and within a team environment. All interested candidates must apply through the GT Office of Human Resources at: https://ea.ohr.gatech.edu/FullDescription.asp?jobid=CEW6177&type=3&typeofjob=ext&jobtitle=WEB%20DEVELOPER