Hi folks: Based on a chat on the #OpenILS-Evergreen channel tonight, Josh Ferraro from LibLime gave me the okay to post the Z39.50 server implementation he had submitted to the Evergreen project back in the pre-1.0 days; the patch had unfortunately been lost when the mailing lists for the project were converted over to the mailman server and the previous list archives were lost.
The code is under a GPL license, and probably needs some polishing and updating, but I am happy to bring it forward in case someone with the interest / time / capabilities wants to pick it up and help fill a functionality gap in the short term. Thanks Josh! -- Dan Scott Laurentian University
#!/usr/bin/perl # Implements a very basic Z39.50 server for Evergreen # Copyright 2005 (C) Georgia Public Library Service # Joshua Ferraro <[EMAIL PROTECTED]> # # use perldoc Z3950Server.pl for nicely formatted documentation. # =head1 NAME Z3950Server.pl - A Z39.50 Server for the Evergreen Integrated Library System (http://openils.org) =head1 SYNOPSIS # su opensrf $ ./Z3950Server.pl -D -d Z3950Server 192.168.1.1:9999 replacing 192.168.1.1 with your IP address and 9999 with the desired port number. This will run the server in Daemon mode ... or ... $ ./Z3950Server.pl [-T ] [-a file ] [-v level ] [-l file ] [-u uid ] [-c config ] [-f vconfig ] [-C fname ] [-t minutes ] [-k kilobytes ] [-d daemon ] [-w dir ] [-p pidfile ] [-ziDST1] [listener-spec...] See NOTES for details on application invocation options. =head1 DESCRIPTION Implements a Z39.50 Server for the Evergreen Open Source Integrated Library System (http://openils.org). Currently, this Z39.50 server almost conforms to the Bath profile in Functional Area A at Conformance Level 0. The goal is Functional Areas A and B and Conformance Level 1 before the system is placed in production. See NOTES for details on the Bath Profile. =head1 PREREQUISITES Yaz - http://indexdata.dk/yaz Net::Z3950::SimpleServer - on CPAN MARC::Record - CPAN version currently has bugs, use the sourceforge version. See NOTES and BUGS for details. MARC::File::XML - see NOTES and BUGS OpenSRF and OpenILS - available from http://openils.org =head1 SEE ALSO Net::Z3950::SimpleServer The Yaz User's Guide and Reference - http://www.indexdata.dk/yaz/doc =head1 BUGS =head2 CHILD PROCESSES There is a problem with child processes as spawned within the Yaz libraries and Perl's way of finding the current process ($$). This breaks OpenILS's Jabber calls because they rely on the Perl having the correct PID for the current process in $$. Here is a patch for SimpleServer that updates Perl's $$, and fixes the problem. Thanks to Adam at Index Data for submitting this patch in less than an hour after the problem was identified. Index: SimpleServer.xs =================================================================== RCS file: /home/cvsroot/simpleserver/SimpleServer.xs,v retrieving revision 1.35 diff -u -r1.35 SimpleServer.xs --- SimpleServer.xs 9 Nov 2005 09:35:47 -0000 1.35 +++ SimpleServer.xs 30 Jan 2006 18:58:50 -0000 @@ -1182,9 +1182,17 @@ SV *handle; HV *href; SV **temp; + GV* tmpgv; + ENTER; SAVETMPS; + + if ((tmpgv = gv_fetchpv("$",TRUE, SVt_PV))) { + SvREADONLY_off(GvSV(tmpgv)); + sv_setiv(GvSV(tmpgv), getpid()); + SvREADONLY_on(GvSV(tmpgv)); + } zhandle->nmem = nmem; zhandle->stop_flag = 0; To apply the patch to SimpleServer, do the following: # cd /root/.cpan/build/Net-Z3950-SimpleServer-0.08/ Use your favorite editor to add the lines above with a + to the SimpleServer.xs right around line 1182 ... then, run: # perl Makefile.pl # make # make install I know this works with Perl version 5.8.x, but it's untested with previous versions and I'm not sure whether SvREADONLY_off/on will continue to be maintained in future versions of Perl. If you can't patch SimpleServer, you can run the server in 'single' mode with the -S option, which solves the problem as well (since there is only one PID to begin with). However, keep in mind, it's a single-process, single-thread option, and it could very easily create bottlenecks if the backend doesn't result quickly to all requests... basically -S has *never* been meant for any kind of production use. =head2 UTF-8 ENCODED RECORDS Records coming out with utf-8 encoding will throw some errors if you're running the 1.x series of MARC::Record. There's a possibility this is due to the CPAN version of MARC::Record (1.x) as_usmarc(); method calculating the directory offsets incorrectly for utf-8 (at least for chars outside the ascii range). Whatever the reason, upgrading to MARC::Record 2.0RC1 fixes the problem. Upgrade to MARC::Record or download it from Sourceforge: http://sourceforge.net/project/showfiles.php?group_id=1254 , unpack it in a directory and add a use lib "directory/you/unpacked/it/to"; line for that version in this script where indicated. =head2 OTHER If you run the server as a foreground process, instead of as a daemon (with -D -d daemonname), the script won't exit with Ctrl-C unless you reset the SIG{INT} to 'DEFAULT' (see commented out code). Please report any other bugs to Joshua Ferraro <[EMAIL PROTECTED]> =head1 TODO add support for SUTRS, XML and UNIMARC clean up error conditions (return appropriate diagnostics) need to support more attribute types and values structure 101, normalized precision matched for name headings =cut use strict; use warnings; use Net::Z3950::SimpleServer; ## comment out if you're planning on running this from the # command line : it will make it easier to kill the process # using sigterm (CTRL-C) $SIG{INT} = 'DEFAULT'; # FIXME: this path probably shouldn't be hard-coded use lib "/openils/lib/perl5/"; # alternative method of daemonizing using OpenSRF's # daemonize methods #use OpenSRF::Utils qw/:daemon/; #daemonize('PINES Z39.50 Server'); use OpenSRF::System qw|/openils/conf/bootstrap.conf|; use OpenSRF::AppSession; use OpenILS::Utils::Fieldmapper; # Change this to the path for your MARC::Record 2.0 use lib "/home/jmf/working/perlmods/MARC-Record-2.0/lib"; use MARC::File::XML; use MARC::Record; use MARC::Charset; ## Some configuration stuff # valid database names my $dbs = { "PINES" => "PINES", }; ## Z39.50 handler stuff and launch a server my $handler = Net::Z3950::SimpleServer->new(INIT => \&init_handler, CLOSE => \&close_handler, SEARCH => \&search_handler, FETCH => \&fetch_handler); $handler->launch_server("Z3950Server.pl", @ARGV); ## Supporting subs sub init_handler { my ($args)[EMAIL PROTECTED]; ## check process use Linux::Pid; if (Linux::Pid::getpid() != $$) { print "WARNING - You need to patch SimpleServer - WARNING\n"; print "\tPerl thinks the PID is:".$$."\n"; print "\tReal PID: ".Linux::Pid::getpid()." PPID:".Linux::Pid::getppid()."\n"; print "\tSee the pod documentation for more details\n"; } ## Setup a session handler object my $session = {}; $args->{HANDLE} = $session; ## Bootstrap OpenSRF::System->bootstrap_client; ## Create a OpenSRF session object my $ses = OpenSRF::AppSession->create('open-ils.storage'); $args->{IMP_NAME} = ""; $args->{IMP_VER} = ""; $args->{ERR_CODE} = 0; ## Save session stuff in the session variable $args->{HANDLE} = {"ses" => $ses, }; # How to handle authentication if we ever need it if (defined($args->{PASS}) && defined($args->{USER})) { printf("Received USER/PASS=%s/%s\n", $args->{USER},$args->{PASS}); } } sub close_handler { my ($args)[EMAIL PROTECTED]; my $ses = $args->{HANDLE}->{ses}; $ses->finish; print "Closing connection: ".$ses."\n"; } sub search_handler { my($args)[EMAIL PROTECTED]; my $set_id=$args->{SETNAME}; #for this session ## Check DB Settings my $db; # not that we'll ever need this monster, but: my $query=$args->{QUERY}; # Makes recursive programming almost not seem like work at all :) my ($term,$pquery) = eval { $args->{RPN}->{query}->render($args) }; #print "Term as parsed from RPN:".$term."\n"; #print "Zsearch as parsed from RPN:".$pquery->{'zsearch'}."\n"; #print "Restrict as parsed from RPN:".$pquery->{'restrict'}."\n"; if ($@) { # process errors in parsing query $args->{ERR_CODE} = [EMAIL PROTECTED]>{err_code}; if ([EMAIL PROTECTED]>{err_str}) { $args->{ERR_STR} = [EMAIL PROTECTED]>{err_str}; } return; } ## Save session stuff in the session variable $args->{HANDLE} = {'term' => $term, #pquery->{'term'}, 'zsearch' => $pquery->{'zsearch'}, 'restrict' => $pquery->{'restrict'}, 'ses' => $args->{HANDLE}->{'ses'}, }; ## Perform the query on the specified set of databases ## and return the number of hits: my $hits = number_of_hits($args); $args->{HITS} = $hits; ## Save session stuff in the session variable $args->{HANDLE}->{'hits'} = $hits; ## print to the log print "got search: ", $pquery->{'term'}, "\n"; } sub fetch_handler { my ($args)[EMAIL PROTECTED]; #my $query = $args->{HANDLE}->{query}; print "zsearch in fetch:".$args->{HANDLE}->{'zsearch'}."\n"; my $hits = $args->{HANDLE}->{hits}; my $ses = $args->{HANDLE}->{ses}; my $set_id = $args->{SETNAME}; ## Format requested from client my $req_form = '1.2.840.10003.5.10'; #MARC21 is default if ($args->{REQ_FORM}) { #Client requested format OID $req_form = $args->{REQ_FORM}; } # We only support MARC21 currently # FIXME: it would be trivial to support Unimarc, STURS, and text unless ($req_form eq "1.2.840.10003.5.10") { die {'err_code' => 114, 'err_str' => 'requested format unavailable'}; #FIXME: need the right error code and error string here } $args->{REP_FORM} = $req_form; # Set provided format OID my $comp = $args->{COMP}; # Formatting instructions ignored my $record = fetch_a_record($args); #$query,$attr_name,$args->{OFFSET},$ses); $args->{RECORD} = $record; if ($hits==$args->{OFFSET}) { ## Last record in set? $args->{LAST} = 1; } else { $args->{LAST} = 0; } } sub number_of_hits { my ($args)[EMAIL PROTECTED]; my $term = $args->{HANDLE}->{'term'}; my $zsearch = $args->{HANDLE}->{'zsearch'}; my $restrict = $args->{HANDLE}->{'restrict'}; my $ses = $args->{HANDLE}->{'ses'}; my $res = $ses->request( $zsearch, term => $term, restrict => $restrict, limit => 1 )->gather(1); my $total = $res->[0]->[2]; return $total; } # fetch one record at a time sub fetch_a_record { my ($args)[EMAIL PROTECTED]; my $term = $args->{HANDLE}->{'term'}; my $zsearch = $args->{HANDLE}->{'zsearch'}; my $restrict = $args->{HANDLE}->{'restrict'}; my $offset = $args->{OFFSET}; # Offset is 0-based in Evergreen $offset-=1; my $ses = $args->{HANDLE}->{'ses'}; my $res = $ses->request( $zsearch, term => $term, restrict => $restrict, offset => $offset, limit => 1 )->gather(1); my $records = $ses->request( 'open-ils.storage.direct.biblio.record_entry.batch.retrieve.atomic', map { $_->[0] } @$res )->gather(1); for my $rec ( @$records ) { #but there's only ever one #print $rec->marc; my $marc21_record = xml_to_marc21($rec->marc); return $marc21_record; } } # convert from MARCXML to MARC21 sub xml_to_marc21 { my ($xml)[EMAIL PROTECTED]; my $record = MARC::Record->new_from_xml($xml); my $marc21 = $record->as_usmarc; return $marc21; } ## Augmented Classes follow ... see the pod documentation for details package Net::Z3950::RPN::Term; sub render { my ($args)[EMAIL PROTECTED]; my $field = 'keyword'; # This is our default my $fieldtype = 'character'; my $relation = undef; my $left_anchor = 0; my $complete_subfield = 0; my $right_truncation = 0; my $structure = 0; my $term = $args->{term}; my $zsearch = "open-ils.storage.metabib.keyword.Zsearch.atomic"; # Default string we use to query srfsh my $restrict; # Restrictions for full MARC searching my $answer; # Final returned result object with term,zsearch ## Preliminarily, check the database name and make sure there is only # one database name in the request and that it's the right one my $db; my $atts = $args->{attributes}; untie $atts; ## First we determine USE attribute foreach (@$atts) { my $type = $_->{attributeType}; my $value = $_->{attributeValue}; #m4, 21, 31, 1003, 1007: title, subject heading, date of publication, author, identifier-standard if ($type == 1) { # USE # MODS searching if ($value==1016) { #1016 == Any $field = 'keyword'; $fieldtype = 'character'; } elsif ($value==4) { #4 == Title $field = 'title'; $fieldtype = 'character'; } elsif ($value==21) { #21 == Subject $field = 'subject'; $fieldtype = 'character'; } elsif ($value==1003) { #1003 == Author $field = 'author'; $fieldtype = 'character'; } elsif ($value==5) { #5 == Series $field = 'series'; $fieldtype = 'character'; # Full MARC Searching } elsif ($value==1007) { #1007 == isbn $field = ''; $fieldtype = 'character'; $restrict = { tag => 020, subfield => 'a' }; } elsif ($value!=1016) { # unknown/unsupported die {'err_code' => 114, 'err_str' => $value}; } } } # Then we can handle any other attributes foreach (@$atts) { my $type = $_->{attributeType}; my $value = $_->{attributeValue}; if ($type == 2) { # Relation if ($fieldtype eq 'character') { if ($value != 3) { die {'err_code' => 123, 'err_str' => $value}; } } else { # Here's how we could handle Integer or date fields # if Evergreen supported them die {'err_code' => 123, 'err_str' => $value}; #if ($value == 1) { $relation = '<'; } #elsif ($value == 2) { $relation = '<='; } #elsif ($value == 3) { $relation = '='; } #elsif ($value == 4) { $relation = '>='; } #elsif ($value == 5) { $relation = '>'; } #elsif ($value == 6) { $relation = '!='; } #else { # die {'err_code' => 123, 'err_str' => $value}; #} } } elsif ($type == 3) { # Position if ($value == 1 || $value == 2) { $left_anchor = 1; } elsif ($value != 3) { die {'err_code' => 119, 'err_str' => $value}; } } elsif ($type == 4) { # Structure #FIXME: this doesn't do anything yet if ($value == 101) { $structure = 1; } elsif ($value == 2) { $structure = 2; } elsif ($value != 1) { die {err_code => 120, err_str => $value}; #FIXME: add appropriate error string } # we can handle this in Evergreen thusly: # quoted multi-word string == phrase, quoted single word == word, unquoted words == normalized where normalized == stemmed search } elsif ($type == 5) { # Truncation -- we ignore it for now if ($value == 1) { $right_truncation = 1; } elsif ($value != 100) { die {err_code => 120, err_str => $value}; } } elsif ($type == 6) { # Completeness -- we ignore it for now if ($fieldtype eq 'character') { if ($value == 2 || $value == 3) { $complete_subfield = 1; } } elsif ($value != 2 && $value != 3) { die {'err_code' => 122, 'err_str' => $value}; } } elsif ($type != 1) { # Unknown/unsupported attribute type die {'err_code' => 113, 'err_str' => $type}; } } if ($fieldtype eq 'character') { # Here's how we could handle other attribute types, etc. # $relation = ''; #would be equiv to SQL LIKE #$term = "*$term" unless $complete_subfield or $left_anchor; #$term = "$term*" unless $complete_subfield and !$right_truncation; #$term = "\$$field $relation '$term'"; } else { if (!$relation) { $relation = ''; # would be equiv to SQL '=' } #$term = "\$$field $relation $term"; } # Build and return our final object if ($restrict) { $zsearch = "open-ils.storage.metabib.full_rec.Zmulti_search.atomic"; } else { $zsearch = "open-ils.storage.metabib.".$field.".Zsearch.atomic"; } $answer = { term => $term, zsearch => $zsearch, restrict => $restrict, }; return ($term,$answer); } package Net::Z3950::RPN::And; sub render { my ($args) = @_; my ($var1,$var2) = $args->[0]->render($args); my ($var3,$var4) = $args->[1]->render($args); return ($var1 . ' ' . $var3 , $var2); } package Net::Z3950::RPN::Or; sub render { my ($args) = @_; my ($var1,$var2) = $args->[0]->render($args); my ($var3,$var4) = $args->[1]->render($args); return ($var1 . ' | ' . $var3 , $var2); } package Net::Z3950::RPN::AndNot; sub render { my ($args) = @_; my ($var1,$var2) = $args->[0]->render($args); my ($var3,$var4) = $args->[1]->render($args); return ($var1 . ' -' . $var3 , $var2); } =head1 NOTES =head2 SERVER MODES There are two methods for daemonization of this script: one provided by SimpleServer and one provided by OpenSRF. To use SimpleServer's Daemon Mode run this script with the -D option and the optional -d daemonname. To enable the OpenSRF daemonization, comment out the appropriate lines in the code. Not sure if Evergreen is thread safe, threaded mode is untested. =head2 MARC:: MODULES Make sure to use the unicode-friendly MARC::Record module that's on SourceForge (MARC::Record 2.0RC1 is what I've tested with). The 1.x series will have trouble calculating directory offsets in utf-8 encoded records and will result in some errors upon record retrieval. Evergreen uses a modifed version of MARC::File::XML. =head2 GENERAL Z39.50 INFO =head3 Z39.50 http://www.loc.gov/z3950/agency/ =head3 The Bath Profile The Bath Profile identifies a subset of specifications from the Z39.50 Information Retrieval Protocol (ANSI/NISO Z39.50/ISO 23950) for use in Z39.50 client and server software. The Bath Profile Version 1.0 (stable) can be found here: http://www.ukoln.ac.uk/interop-focus/activities/z3950/int_profile/bath/draft/stable1.html =head3 The Bib-1 Attribute Set http://www.loc.gov/z3950/agency/defns/bib1.html =head2 PQF->EVERGREEN QUERY MAPPING AUGMENTED CLASSES The Net::Z3950::RPN:: packages included map PQF queries to our native query language by overriding the 'render' method on each query tree node type (so-called 'Augmented Clases'). See the Net::Z3050::SimpleServer documentation for more details about the tree node types. The most obvious use of these augmented classes is for handling Boolean, which is pretty self-documenting in the classes themselves. In addition, we can use them to handle pretty much any PQF query that comes our way. You basically just call the render() method on an RPN tree and you get something else back. Call it thusly: my $query = eval { $args->{RPN}->{query}->render($args) }; if ($@) { # process errors in parsing query $args->{ERR_CODE} = [EMAIL PROTECTED]>{err_code}; if ([EMAIL PROTECTED]>{err_str}) { $args->{ERR_STR} = [EMAIL PROTECTED]>{err_str}; } return; } We then pass our now correctly-formatted $query variable on to perform a search in Evergreen. Our native query language in Evergreen (via an srfsh request) provides us with two query types: basic classed searching, and full MARC searching. The classed searches (open-ils.storage.metabib.$class.Zsearch) use MODS to distil title/author/subject/keyword/series: open-ils.storage.metabib.{title|author|subject|keyword|series}.Zsearch =head3 Basic Keyword Searching Example my $res = $ses->request( 'open-ils.storage.metabib.keyword.Zsearch.atomic', term => 'harry potter -stone', )->gather(1); For specific searches (mainly isbn at the moment) we resort to full MARC searching, something like: my $res = $ses->request( 'open-ils.storage open-ils.storage.metabib.full_rec.Zmulti_search', "searches", [ { "term": "0590353403", "restrict": [ {"tag": "020"} ] } ] )->gather(1); =head3 Full MARC Searching Example my $res = $ses->request( 'open-ils.storage.metabib.full_rec.Zmulti_search.atomic', searches => [ { term => "bloom harold", restrict => [ { tag => 100 }, { tag => 700 } ] }, { term => "shakespeare", restrict => [ { tag => 245, subfield => 'a' }, ] }, ] )->gather(1); OR support is available in the MARC Searching through a param called 'class_join'. It's set to AND' or 'OR', and it defaults to 'AND' but is global, not hierarchical as in PQF , so we're fudging it a bit until Evergreen supports the full tree. =head2 APPLICATION INVOCATION OPTIONS Supported options are: -T Start the server in threaded mode -a file Specify a file for dumping PDUs (for diagnostic purposes). The special name - (dash) sends output to stderr. -S Don't fork or make threads on connection requests. This is good for debugging, but not recommended for real operation: Although the server is asynchronous and non-blocking, it can be nice to keep a software malfunction (okay then, a crash) from affecting all current users. -1 Like -S but after one session the server exits. This mode is for debugging only . -T Operate the server in threaded mode. The server creates a thread for each connection rather than a fork a process. Only available on UNIX systems that offers POSIX threads. -s Use the SR protocol (obsolete). -z Use the Z39.50 protocol (default). This option and -s complement each other. You can use both multiple times on the same command line, between listener-specifications (see below). This way, you can set up the server to listen for connections in both protocols concurrently, on different local ports. -l file The logfile. -c config A user option that serves as a specifier for some sort of configuration, usually a filename. The argument to this option is transferred to member configname of the statserv_options_block. -f vconfig This specifies an XML file that describes one or more YAZ frontend virtual servers. -C fname Sets SSL certificate file name for server (PEM). -v level The log level. Use a comma-separated list of members of the set {fatal,debug,warn,log,malloc,all,none}. -u uid Set user ID. Sets the real UID of the server process to that of the given user. It's useful if you aren't comfortable with having the server run as root, but you need to start it as such to bind a privileged port. -w dir The server changes to this directory during before listening on incoming connections. This option is useful when the server is operating from the inetd daemon (see -i). -p pidfile Specifies that the server should write its Process ID to file given by pidfile . A typical location would be /var/run/yaz-ztest.pid. -i Use this to make the the server run from the inetd server (UNIX only). -D Use this to make the server put itself in the background and run as a daemon. If neither -i nor -D is given, the server starts in the foreground. -t minutes Idle session timeout, in minutes. -k size Maximum record size/message size, in kilobytes. -d daemon Set name of daemon to be used in hosts access file. See hosts_access(5) and tcpd(8). -m time-format Sets the format of time-stamps in the log-file. Specify a string in the input format to strftime(). A listener specification consists of a transport mode followed by a colon (:) followed by a listener address. The transport mode is either tcp, unix: or ssl. For TCP and SSL, an address has the form hostname | IP-number [: portnumber] The port number defaults to 9999 (non-standard Z39.50 port). The address is the filename of socket. For TCP/IP and SSL, the special hostname @ (at sign) is mapped to the address INADDR_ANY, which causes the server to listen on any local interface. =head1 VERSION Z3950Server.pl Version 0.01 =head1 REVISION HISTORY 0.01 2006-01-31 - Original version =head1 AUTHOR Joshua Ferraro <[EMAIL PROTECTED]> =head1 COPYRIGHT AND LICENSING Copyright (C) 2005 Georgia Public Library Service Joshua Ferraro <[EMAIL PROTECTED]> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. =cut
