Re: [CODE4LIB] Radioactive records for Solr

2007-02-09 Thread Erik Hatcher

On Feb 9, 2007, at 3:58 AM, Rob Styles wrote:

Here's the set that I generated a while ago - it's quite big as it
covers the full Marc21 field and subfield set for bibliographic
records.
I'm releasing these under the terms of our Talis Community License.
(http://www.talis.com/tdn/tcl)


IANAL, so to clarify this license, would it be ok for me to check
this into Solr's repository at Apache (keeping the license file
alongside)?

I'm not quite sure what we'd do with this data just yet, as it looked
like gibberish at first blush, but looking at the document Peter
linked to it is by definition supposed to be this way and not overlap
with real data.


Would people be interested in a write-up of how we've used
RadioactiveMarc and automated tests to validate Bath and US National
Profile compliance?


Absolutely.  This would certainly factor into my Solr efforts in
crafting more automated tests.

   Erik


Re: [CODE4LIB] Radioactive records for Solr

2007-02-09 Thread Rob Styles
Yes, it would be ok to check this in with a copy of the license
alongside it - or even just a readme with a link to the license would be
fine.

The approach is a formulaic one, there are 10 records in there each
built up using the same algorithm. You can add these to a store of
legitimate marc data with very little chance of them appearing in search
results unless searched for specifically.

Taking just one field out of one record (in MarcEdit syntax)...

=027  \\$ara0271a1r ra0271a2r ra0271a3r$zra0271z1r ra0271z2r
ra0271z3r$6ra027161r ra027162r ra027163r$8ra027181r ra027182r ra027183r

Each subfield contains 3 tokens (words) for subfield a:

ra0271a1r ra0271a2r ra0271a3r

'r' at the start and end of each token is a padding character for
testing truncation, 'a' is the record type, '027' the field, '1' the
occurrence of that field, 'a' the subfield code, '1', '2', '3' the
occurrence of the token in the subfield. This allows all of the
truncation, word, phrase, completeness and position combinations to be
tested separately - with just one record coming back for each.

rob

Rob Styles
Programme Manager, Data Services, Talis
tel: +44 (0)870 400 5000
fax: +44 (0)870 400 5001
direct: +44 (0)870 400 5004
mobile: +44 (0)7971 475 257
msn: [EMAIL PROTECTED]
irc: irc.freenode.net/mrob,isnick


> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
> Erik Hatcher
> Sent: 09 February 2007 10:18
> To: CODE4LIB@listserv.nd.edu
> Subject: Re: [CODE4LIB] Radioactive records for Solr
>
> On Feb 9, 2007, at 3:58 AM, Rob Styles wrote:
> > Here's the set that I generated a while ago - it's quite big as it
> > covers the full Marc21 field and subfield set for bibliographic
> > records.
> > I'm releasing these under the terms of our Talis Community License.
> > (http://www.talis.com/tdn/tcl)
>
> IANAL, so to clarify this license, would it be ok for me to check
> this into Solr's repository at Apache (keeping the license file
> alongside)?
>
> I'm not quite sure what we'd do with this data just yet, as it looked
> like gibberish at first blush, but looking at the document Peter
> linked to it is by definition supposed to be this way and not overlap
> with real data.
>
> > Would people be interested in a write-up of how we've used
> > RadioactiveMarc and automated tests to validate Bath and US National
> > Profile compliance?
>
> Absolutely.  This would certainly factor into my Solr efforts in
> crafting more automated tests.
>
> Erik

The very latest from Talis
read the latest news at www.talis.com/news
listen to our podcasts www.talis.com/podcasts
see us at these events www.talis.com/events
join the discussion here www.talis.com/forums
join our developer community www.talis.com/tdn
and read our blogs www.talis.com/blogs


Any views or personal opinions expressed within this email may not be those of 
Talis Information Ltd. The content of this email message and any files that may 
be attached are confidential, and for the usage of the intended recipient only. 
If you are not the intended recipient, then please return this message to the 
sender and delete it. Any use of this e-mail by an unauthorised recipient is 
prohibited.


Talis Information Ltd is a member of the Talis Group of companies and is 
registered in England No 3638278 with its registered office at Knights Court, 
Solihull Parkway, Birmingham Business Park, B37 7YB.


[CODE4LIB] Job Posting: Data Services Librarian, UCSD Libraries

2007-02-09 Thread Declan Fleming
THE UCSD LIBRARIES - University of California, San Diego

Data Services Librarian

Social Sciences & Humanities Library

Preferred appointment level: Assistant Librarian I -Librarian I with an 
approximate salary range of $40,008-$66,756

The Libraries of the University of California, San Diego (UCSD) seek 
applications from innovative and user-oriented library professionals to join 
the enthusiastic staff of the Social Sciences and Humanities Library in the 
development of data services and collections. The UCSD Libraries are committed 
to making access to research information for faculty and students as efficient 
and convenient as technology, innovation, and resources will allow.

Responsibilities of the Position Reporting to the Head for Data, Government and 
Geographic Information Services, assumes a leadership role in planning and 
developing data services and collections at UCSD. Evaluates, selects, and 
acquires data products for the Social Science Data Collection (SSDC). Provides 
consultations and instruction on finding and using data. Works closely with 
faculty to determine data needs and discovery tools.  Actively collaborates 
with library liaisons to academic departments to develop and provide 
coordinated services to meet the full range of social science data needs at 
UCSD. Plans and implements staff training programs related to data.  Serves as 
the liaison to ICPSR and DDI initiatives.  Provides general reference 
assistance at a combined social sciences/ humanities and government documents 
reference desk. Some evening and weekend reference hours required. Represent 
the UCSD Libraries at pertinent meetings and conferences. UCSD Librarians are 
ex!
 pected to participate in library-wide planning and professional activities 
appropriate to their position. Appointment at the Librarian rank requires 
substantial relevant experience and superior qualifications.

Required Qualifications
Professional degree from a library school or other appropriate degree 
or equivalent experience in one or more fields relevant to library services.
Experience working with numeric data resources, including the ability 
to install and support relevant software applications.
Ability to download and transform data files to meet the various needs 
of users.
Proficient with social sciences statistical packages and software, such 
as SAS and SPSS.
Experience working with users to provide data services support.
Ability to develop effective relationships with the library and campus 
IT departments.
Demonstrated knowledge of a wide range of print and electronic 
reference and bibliographic tools in the social sciences.
Experience in collection development activities in an academic library.
Experience using government information.
Superior organizational, analytical, and communication skills.
Candidate must have a strong commitment to excellence in service and be 
able to work both independently and collaboratively in a complex changing 
environment.
Potential to excel in a dynamic, academically challenging environment.

UCSD librarians are expected to participate in library-wide and system-wide 
planning and governance, and to be professionally active.

Desirable Qualifications
Undergraduate and/or graduate degree in the social sciences.
Experience providing reference service in an academic or research 
library serving similar clientele.
Experience with GIS data and applications

For complete details, see http://orpheus.ucsd.edu/fac/SSHLDataLibrarian.htm..

Application consideration begins March 23, 2007 and will continue until the 
position is filled. Send application letter including a statement of 
qualifications, a full resume of education and relevant experience, and the 
names of at least three persons who are knowledgeable about your qualifications 
for this position to [EMAIL PROTECTED] or to UCSD, Stacey McDermaid - Library 
Human Resources, 9500 Gilman Drive Dept. 0175-H, La Jolla, CA 92093-0175. 
Telephone: 858.534.1279; Confidential Fax: 858.534.8634.  Equal 
opportunity/affirmative action employer


Re: [CODE4LIB] Radioactive records for Solr

2007-02-09 Thread Nathan Vack

On Feb 9, 2007, at 2:58 AM, Rob Styles wrote:


Would people be interested in a write-up of how we've used
RadioactiveMarc and automated tests to validate Bath and US National
Profile compliance?


zOMG, yes!

-Nate
Wendt Library
UW - Madison


Re: [CODE4LIB] Radioactive records for Solr

2007-02-09 Thread Doran, Michael D
> ... the Zinterop records (which are described
> in detail in that pdf but aren't available for
> download anywhere I could find ...

I sent Bill Moen an email asking if the records described in the report were 
available for download, and if not, could they be made available.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/


> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On
> Behalf Of Binkley, Peter
> Sent: Thursday, February 08, 2007 3:13 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Radioactive records for Solr
>
> In hunting for data to help model subject faceting for MARC
> records, I've just been looking at Bill Moen's Zinterop
> report
> (http://www.unt.edu/zinterop/ZInterop2/Documents/ZInterop2Fina
> lReport_we
> m4Dec2005.pdf). It occurs to me that with all our various
> projects working on indexing MARC records in Solr, we should
> set up and distribute a set of "radioactive records" to use
> in each project to diagnose and compare indexing and querying
> behaviour. Probably we could just use the Zinterop records
> (which are described in detail in that pdf but aren't
> available for download anywhere I could find); but we might
> want to enhance them with data suitable for testing our
> faceting systems. Not sure what that would mean but I thought
> I'd throw it out.
>
> If you were at Access '05, you heard Bill describe the Z39.50
> testing he was doing with radioactive records: records with
> known unique values in all indexed fields, that could be used
> for automated testing of Z39.50 search functionality. The
> same approach might be very useful as we feel our way towards
> a Solr MARC indexing system.
>
> Has anyone already done something like this?
>
> Peter
>
> Peter Binkley
> Digital Initiatives Technology Librarian Information
> Technology Services 4-30 Cameron Library University of
> Alberta Libraries Edmonton, Alberta Canada T6G 2J8
> Phone: (780) 492-3743
> Fax: (780) 492-9243
> e-mail: [EMAIL PROTECTED]
>


[CODE4LIB] Very large file uploads, PHP or possibly Perl

2007-02-09 Thread Thomas Dowling
I have always depended on the kindness of strange PHP gurus.

I am trying to rewrite a perpetually buggy system for uploading large
PDF files (up to multiple tens of megabytes) via a web form.  File
uploads are very simple in PHP, but there's a default maximum file size
of 2MB.  Following various online hints I've found, I've gone into
php.ini and goosed up the memory_limit, post_max_size, and
upload_max_size (and restarted Apache), and added an appropriate hidden
form input named MAX_FILE_SIZE.  The 2MB limit is still in place.

Is there something I overlooked?  Or, any other suggestions for how to
take in a very large file?

[My current Perl version has a history of getting incomplete files in a
non-negligible percentage of uploads.  Weirdness ensues: whenever this
happens, the file reliably cuts off at the same point, but the cutoff is
not a fixed number of bytes, nor is it related to the size of the file.]


--
Thomas Dowling
[EMAIL PROTECTED]


Re: [CODE4LIB] Very large file uploads, PHP or possibly Perl

2007-02-09 Thread Andrew Darby

I haven't needed to upload such large files, but I wonder if using the
ftp functions in php would bypass this problem:

http://us3.php.net/manual/en/ref.ftp.php

Andrew

On 2/9/07, Thomas Dowling <[EMAIL PROTECTED]> wrote:

I have always depended on the kindness of strange PHP gurus.

I am trying to rewrite a perpetually buggy system for uploading large
PDF files (up to multiple tens of megabytes) via a web form.  File
uploads are very simple in PHP, but there's a default maximum file size
of 2MB.  Following various online hints I've found, I've gone into
php.ini and goosed up the memory_limit, post_max_size, and
upload_max_size (and restarted Apache), and added an appropriate hidden
form input named MAX_FILE_SIZE.  The 2MB limit is still in place.

Is there something I overlooked?  Or, any other suggestions for how to
take in a very large file?

[My current Perl version has a history of getting incomplete files in a
non-negligible percentage of uploads.  Weirdness ensues: whenever this
happens, the file reliably cuts off at the same point, but the cutoff is
not a fixed number of bytes, nor is it related to the size of the file.]


--
Thomas Dowling
[EMAIL PROTECTED]




--
Andrew Darby
Web Services Librarian
Ithaca College Library
http://www.ithaca.edu/library/
[EMAIL PROTECTED]


Re: [CODE4LIB] Very large file uploads, PHP or possibly Perl

2007-02-09 Thread Jay Luker

Pre-apologies if this suggestion is too "duh", but have you confirmed
that you updated the correct php.ini file?

One way to check is to create a temporary, web-accessible file
(phpinfo.php) with the contents:



Load that in your browser and check that the path of the php.ini file
apache is using matches the file you've updated.

I haven't touched php since v4.x, so there may be an easier way to
confirm this nowadays. Also, you should be sure to delete that
phpinfo.php file when you're done so you're not unnecessarily
publishing your server details.

--jay

On 2/9/07, Thomas Dowling <[EMAIL PROTECTED]> wrote:

I have always depended on the kindness of strange PHP gurus.

I am trying to rewrite a perpetually buggy system for uploading large
PDF files (up to multiple tens of megabytes) via a web form.  File
uploads are very simple in PHP, but there's a default maximum file size
of 2MB.  Following various online hints I've found, I've gone into
php.ini and goosed up the memory_limit, post_max_size, and
upload_max_size (and restarted Apache), and added an appropriate hidden
form input named MAX_FILE_SIZE.  The 2MB limit is still in place.

Is there something I overlooked?  Or, any other suggestions for how to
take in a very large file?

[My current Perl version has a history of getting incomplete files in a
non-negligible percentage of uploads.  Weirdness ensues: whenever this
happens, the file reliably cuts off at the same point, but the cutoff is
not a fixed number of bytes, nor is it related to the size of the file.]


--
Thomas Dowling
[EMAIL PROTECTED]



Re: [CODE4LIB] Radioactive records for Solr

2007-02-09 Thread Thomale, J
I just saw this message as well as the follow-ups today. I was one of
the graduate assistants working on this phase of Dr. Moen's Zinterop
project, and I worked quite closely with the Radioactive records and the
Radioactive perl module that Mike Taylor from Index Data developed. I
also helped write some of the documentation for the test scripts that we
developed that utilized that perl module. There had been plans in there
somewhere to develop a paper that really outlined how to use the perl
module to do in-depth Z39.50 interoperability testing, but I graduated,
got a full-time position, and haven't had time to think about it much
since. I always thought it had a lot of potential.

I would be more than happy to lend any help at all in any capacity that
I can. I was always a little bit disappointed that we hadn't gotten to
really follow up on the project much, and I'm glad to see that there is
some interest in it. I just had lunch with Dr. Moen at the OR 2007
conference in San Antonio at the end of last month, so I'm still on very
good terms with him and could perhaps help enlist his help, as well.

Please let me know how I can help. Work keeps me busy, but I can find
time after hours if I need to. It's been over a year, so it might take
me a bit to get back into it, but I'm sure it will come back to me.

(I hope that didn't sound too desperate.) :-)

Thanks,

Jason Thomale
Metadata Librarian
Texas Tech University Libraries



> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
> Binkley, Peter
> Sent: Thursday, February 08, 2007 3:13 PM
> To: CODE4LIB@listserv.nd.edu
> Subject: [CODE4LIB] Radioactive records for Solr
>
> In hunting for data to help model subject faceting for MARC records,
> I've just been looking at Bill Moen's Zinterop report
>
(http://www.unt.edu/zinterop/ZInterop2/Documents/ZInterop2FinalReport_we
> m4Dec2005.pdf). It occurs to me that with all our various projects
> working on indexing MARC records in Solr, we should set up and
> distribute a set of "radioactive records" to use in each project to
> diagnose and compare indexing and querying behaviour. Probably we
could
> just use the Zinterop records (which are described in detail in that
pdf
> but aren't available for download anywhere I could find); but we might
> want to enhance them with data suitable for testing our faceting
> systems. Not sure what that would mean but I thought I'd throw it out.
>
> If you were at Access '05, you heard Bill describe the Z39.50 testing
he
> was doing with radioactive records: records with known unique values
in
> all indexed fields, that could be used for automated testing of Z39.50
> search functionality. The same approach might be very useful as we
feel
> our way towards a Solr MARC indexing system.
>
> Has anyone already done something like this?
>
> Peter
>
> Peter Binkley
> Digital Initiatives Technology Librarian
> Information Technology Services
> 4-30 Cameron Library
> University of Alberta Libraries
> Edmonton, Alberta
> Canada T6G 2J8
> Phone: (780) 492-3743
> Fax: (780) 492-9243
> e-mail: [EMAIL PROTECTED]


Re: [CODE4LIB] Very large file uploads, PHP or possibly Perl

2007-02-09 Thread Andrew Nagy

I have done large file uploads in PHP.  Make sure you have the following
set in php.ini:

upload_max_filesize = 
file_uploads = on
post_max_size = 

Also, you can set these values through the set_ini function in PHP so
that it can be per script instead of effective for every script which
can allow for a more granular level of control for security reasons, etc.

I have never used the form input value, nor should you have to change
the memory_limit very much since the file itself is not loaded into
memory, just information regarding the file.

Andrew

Thomas Dowling wrote:

I have always depended on the kindness of strange PHP gurus.

I am trying to rewrite a perpetually buggy system for uploading large
PDF files (up to multiple tens of megabytes) via a web form.  File
uploads are very simple in PHP, but there's a default maximum file size
of 2MB.  Following various online hints I've found, I've gone into
php.ini and goosed up the memory_limit, post_max_size, and
upload_max_size (and restarted Apache), and added an appropriate hidden
form input named MAX_FILE_SIZE.  The 2MB limit is still in place.

Is there something I overlooked?  Or, any other suggestions for how to
take in a very large file?

[My current Perl version has a history of getting incomplete files in a
non-negligible percentage of uploads.  Weirdness ensues: whenever this
happens, the file reliably cuts off at the same point, but the cutoff is
not a fixed number of bytes, nor is it related to the size of the file.]


--
Thomas Dowling
[EMAIL PROTECTED]



Re: [CODE4LIB] Very large file uploads, PHP or possibly Perl

2007-02-09 Thread Thomas Dowling
On 2/9/2007 11:50 AM, Jay Luker wrote:

> Pre-apologies if this suggestion is too "duh", but have you confirmed
> that you updated the correct php.ini file?


Bless you!  This did the trick, and I can now upload really big files.
And I defy anyone to find something too "duh" for me.  :-)

Yes indeed, as our long-suffering sysadmin has responded to my requests
to rebuild PHP with various tweaks, he has created a series of .ini
files with PHP version numbers built into their names.

Thanks for reminding me that phpinfo would show this.


--
Thomas Dowling
[EMAIL PROTECTED]


Re: [CODE4LIB] Code4lib blog anthology?

2007-02-09 Thread Binkley, Peter
Well jeez, even if you get the selection done you've still got to do the
naming contest and the CIP.

At Dan's suggestion I'm moving this to code4lib instead of cod4libcon;
see quoted message below. Do others agree it's worth trying to get this
done in, like, the next two weeks, so as to have physical copies to show
at code4lib? I'm willing to act as an editor if others will too.

Snipped from the message below is the science bloggers' anthology, done
on the same basis and pulled together in three weeks:

http://scienceblogs.com/clock/2007/01/the_science_blogging_anthology.php

Peter

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Daniel Chudnov
Sent: Friday, February 09, 2007 12:12 PM
To: [EMAIL PROTECTED]
Subject: Re: Code4lib blog anthology?


On Feb 9, 2007, at 12:15 PM, Binkley, Peter wrote:

> It's probably too late for this year, but how about trying to put this

> together for next year: pick a few of the best blog entries from the
> previous year from the code4lib community, enough to fill a few dozen
> pages at least, and release them as a book on lulu.com or one of the
> other print-on-demand services.

Three thoughts:

1) cool idea!
2) why is it too late?  we could still do it for 2006.
3) we could start up a wiki page *today* with nominations for 2007's
second annual edition, and anybody could just add a link/title/author
name when they read something they like
4) should this conversation happen on code4lib-list instead of the
conference list?

(No one expects the spanish inquisition!)

5) we would definitely need to identify editors to assemble and
finalize.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups "code4libcon" group.
To post to this group, send email to [EMAIL PROTECTED] To
unsubscribe from this group, send email to
[EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/code4libcon?hl=en
-~--~~~~--~~--~--~---


[CODE4LIB] Job Opportunity for web developer

2007-02-09 Thread Ross Singer

WEB DEVELOPER Location:  LIBRARY & INFORMATION CENTER
Job #: CEW6177 Hiring Range: $44,330 to $56,500

Education:Bachelor's Degree in Computer Science or a related field or
equivalent combination of education and experience.

Experience:Four or more years of work related experience.
Demonstrated work experience with at least one relational database
management system; comprehensive knowledge of web scripting languages
(such as PHP, ASP, Perl, etc.); proficiency in HTM; knowledge of Unix,
Apache, and other open source technologies, Windows
95/98/2000/ME/XP/NT and MS Office products is strongly preferred.
Proficiency with PHP and MySQL; knowledge and experience with Ruby on
Rails; experience with Plone and/or other content management system is
strongly desired. Familiarity in a University and/or Library setting
preferred. Selection process will include a background check.

Duties:Develop dynamic web-based applications working closely with
designers and other team members. Enable integration of applications
with campus and other library initiatives. Manage the Library's
Intranet framework; provide support for group interaction and
opportunities for library staff to manage their own content. Create
efficient and logical interfaces for clients. Recognize system
deficiencies and implement effective solutions. Provide backup for
general computer support for Systems Department customers as needed.
Provide training and support for new technologies. The position
requires strong interpersonal communications, such as advising,
recommending or counseling, follow-up, exchanging information and
troubleshooting; ensure data integrity, and a proactive, customer and
detail-oriented disposition. Demonstrated ability to: cooperate with a
variety of people and achieve results, prioritize multiple tasks
effectively, pro-actively seek opportunities to broaden and deepen
knowledge base and proficiencies will be effective in this position.
Must be able to work independently and within a team environment.


All interested candidates must apply through the GT Office of Human
Resources at:

https://ea.ohr.gatech.edu/FullDescription.asp?jobid=CEW6177&type=3&typeofjob=ext&jobtitle=WEB%20DEVELOPER