catmandu, catmandu::fedoracommons, mods::record,
The three (Perl) modules described below look pretty cool, as well as kewl -- an interface to read/write MODS, an interface to interact with Fedora Commons, an interface to convert data from one thing to another. I can see how these can be useful tools in some of my work. Thank you. --Eric Lease Morgan On Aug 6, 2013, at 2:59 AM, Patrick Hochstenbach patrick.hochstenb...@ugent.be wrote: LibreCat -=-=-=-= LibreCat is an open collaboration of the university libraries of Lund, Ghent, and Bielefeld to create tools for library and research services. One of the toolkits we provide is called 'Catmandu' (http://search.cpan.org/~nics/Catmandu-0.5004/lib/Catmandu.pm) which is a suite of tools to do ETL processing on library data. We provide tools to import data via JSON, YAML, CSV, MARC, SRU, OAI-PMH and more. To transform this data we created a small DSL language that librarians use in our institutions. Also we make it very easy to store the results in MongoDB, ElasticSearch, Solr or export it into various formats. We create also command line tools because we felt that in our daily jobs we were creating the same type of adhoc Perl scripts over and over for endless reports. E.g. to create a CSV file of all titles in a MARC export we say something like: $ catmandu convert MARC to CSV --fix 'marc_map(245,title); retain_field(record);' records.mrc To get all titles from our institutional repository we say: $ catmandu convert OAI --url http://biblio.ugent.be/oai to JSON --fix 'retain_field(title)' To store a MARC export into a MongoDB we do: $ catmandu import MARC to MongoDB --database_name mydb --bag data records.mrc Here is a blog post about the commands that are available: http://librecat.org/catmandu/2013/06/21/catmandu-cheat-sheet.html See our project page for more information about LibreCat and Catmandu : http://librecat.org and a tutorial how to work with the API http://librecat.org/tutorial/ MODS::Record -=-=-=-=-=-= In one of our Catmandu projects we created a Perl connector for Fedora Commons (http://search.cpan.org/~hochsten/Catmandu-FedoraCommons-0.24). One of our goals was to integrate better with the Islandora project. For this we needed a Perl MODS parser. As there was no module available on CPAN we provide a top level module like MARC::Record called MODS::Record http://search.cpan.org/~hochsten/MODS-Record-0.05/lib/MODS/Record.pm. I hope this will be of some help for the community. If there are coders here who would like to contribute to the MODS package please drop me a line. I think CPAN MODS support shouldn't be dependent on one coder, one institution. Greetings from a sunny Belgium, Patrick
Re: reading and writing of utf-8 with marc::batch [resolved; gigo]
Thank you for all the input, and I think I have resolved my particular issue. Battle won. War still raging. Using the script suggested by Galen as an starting point, I wrote the following hack outputting integers denoting MARC records containing non-UTF-8 characters, but the script output nothing; all the data in all of my records was encoded as UTF-8: #!/usr/bin/perl # require use strict; use Encode; # initialize binmode STDIN, :bytes; $/= \035; my $i = 0; # read STDIN while ( ) { # increment $i++; # check validity eval { my $utf8str = Encode::is_utf8( $_, Encode::FB_CROAK ); }; # check for error if ( $@ ) { print Record $i contains non-UTF-8 characters\n; } } # done exit; Since all of the data in all of my records was UTF-8, then all of the leaders of all of the records need to have a value of a set in position #9 of the leader. So I wrote the following hack (circumventing MARC::Batch): #!/usr/bin/perl # require use strict; # initialize binmode STDIN, :bytes; binmode STDOUT, :bytes; $/ = \035; # loop through the input while ( ) { # do the work and output substr( $_, 9, 1 ) = a; print $_; } # done exit; I then fed the output of my fix routine to my indexing routing, and all of my problems seemed to go away. GIGO? I'm still not sure, but I think deep within MARC::Batch some sort of encoding is observed, honored, and output. And when the denoted encoding is not true and things like binmode( FILE, :utf8 ) get called, output gets munged. Again, I'm not sure. It is almost exhausting. -- Eric Morgan University of Notre Dame
Re: reading and writing of utf-8 with marc::batch [terminal]
On Mar 26, 2013, at 5:57 PM, Leif Andersson leif.anders...@sub.su.se wrote: my first guess would be your terminal is not utf8. While I'm not positive my terminal is doing UTF-8, I think it is. When I dump in the beginning the output to the terminal is correct. After I run my script the output to the same terminal is incorrect. -- Eric Lease Morgan
Re: reading and writing of utf-8 with marc::batch [double encoding]
A number of people have alluded to the problem of double encoding, and I'm beginning to think this is true. I have isolated a number of problem records. They all contain diacritics, but they do not have an a in position #9 of the leader -- http://dh.crc.nd.edu/tmp/original.marc Can someone verify that the file contains UTF-8 characters for me? For these same records I have also added an a in position #9 and created a similar file -- http://dh.crc.nd.edu/tmp/fixed.marc Is it true that original.marc is not denoted correctly, but fixed.marc is denoted correctly? -- Eric Morgan
Re: reading and writing of utf-8 with marc::batch [double encoding]
On Mar 27, 2013, at 4:59 PM, Eric Lease Morgan emor...@nd.edu wrote: When it calls as_usmarc, I think MARC::Batch tries to honor the value set in position #9 of the leader. In other words, if the leader is empty, then it tries to output records as MARC-8, and when the leader is a value of a, it tries to encode the data as UTF-8. How can I figure out whether or not a MARC record contains ONLY characters from the UTF-8 character set? Put another way, how can I determine whether or not position #9 of a given MARC leader is accurate? If position #9 is an a, then how can I read the balance of the record to determine whether or not all the characters really and truly are UTF-8 encoded? -- Eric This Is Almost Too Much For Me Morgan
senior programmer analyst
Notre Dame is hiring a Senior Programmer Analyst, and if you have any questions about the position do not hesitate to drop me a line. Job Description The Digital Access and Information Architecture Department of the University Libraries of Notre Dame is seeking a Senior Programmer Analyst. This position will have three types of responsibilities: 1) write and maintain object-oriented Perl programs, 2) provide systems administration services for a number of Linux-based platforms, and 3) actively participate in the general workings of the Department. The goal of the Department is to help the Libraries implement digital library collections and services. Some of the short-term projects this position provides support for may include: creating a portal for Catholic research materials, enhancing services applied against an institutional repository, supporting the campus-wide search engine, implementing a method for creating and disseminating the content of TEI files, supporting a local LOCKSS host, exploiting Web Services-based computing to acquire and disseminate information, developing and supporting an open source software digital library system called MyLibrary. The ability to write Web-based computer programs in object-oriented Perl is a must but other programming languages are desirable. The successful candidate must also know how to design a (MySQL) relational database, write a valid XML file given a DTD and/or XML schema, and securely administrate small- to mid- sized computers running the Linux operating system. The Department is small, project-oriented, and collaborative in nature. The successful candidate must also possess good written and oral communication skills exemplified by the ability to listen and share ideas in many forms. Minimum Qualifications This position must have a college degree in Computer Science (or in a related field) and/or demonstrated professional experience. The person filling this position must have the ability to: * Write computer applications in object-oriented Perl. * Draw an entity-relationship diagram and therefore have the ability to normalize a relational database and execute SQL queries against a relational database. * Write well-formed and valid XML given an XML schema or DTD. * Listen to other people. * Communicate effectively in written and oral forms. * Work in a collaborative environment. * Troubleshoot and resolve critical hardware/software problems outside normal business hours. The incumbent must have at least three years of programming experience while working in a collaborative environment and have demonstrated their ability to document their efforts in written and oral form. Preferred Qualifications The ability to write computer programs in languages other than Perl (such as Javascript, Java, PHP, or C) is desirable. People who have worked in libraries or academia are encouraged to apply. A knowledge of the open source software development process is as plus. Position Pay Range - $3,560-$5,980/Month To apply go http://jobs.nd.edu and search for requisition number 020060267. AA/EOE -- Eric Lease Morgan Head, Digital Access and Information Architecture Department University Libraries of Notre Dame (574) 631-8604
open source tools and xml
I will be facilitating a half-day tutorial on open source software and XML at the upcoming Joint Conference on Digital Libraries (JCDL), and I thought some people here might want to attend. JCDL has the reputation for being a quality conference, and Chapel Hill (North Carolina) is a nice place to visit. Title Tutorial 11: Exploiting open source tools to create, maintain, and disseminate XML content Abstract XML is quickly becoming the means of marking up data for the purposes of transmitting information from one computer to another. While XML can be created by hand, the process is tedious and not necessarily scalable. Software systems can address this problem, and this tutorial enumerates, describes, and demonstrates ways open source software can be used to create, maintain, and disseminate XML. The goal of this tutorial is to increase participants' knowledge of these tools and to demonstrate how to take advantage of them in everyday digital library work and software development. Target Audience Software engineers and librarians/intermediate Presenter Eric Lease Morgan is the Head of the Digital Access and Information Architecture Department at the University Libraries of Notre Dame. He considers himself to be a librarian first and a computer user second. His professional goal is to discover new ways to use computers to provide better library service. Some of his more well-known investigations and implementations include MyLibrary and the Alex Catalogue of Electronic Texts. An advocate for open source software and open access publishing, Morgan has been freely distributing his software and publications for years before the terms open source and open access were coined. Morgan also hosts his own Internet domain, infomotions.com. http://jcdl2006.org/program/afternoon-tutorials -- Eric Morgan University Libraries of Notre Dame
mylibrary manual
I am happy and proud to announce the availability of the newest version of the MyLibrary manual called Designing, Implementing, and Maintaining Digital Library Services and Collections with MyLibrary. See: http://dewey.library.nd.edu/mylibrary/manual/ Code4Liber's will enjoy it because the principles it puts forth can be applied to many digital library settings. OSS4Libers' will enjoy it because it puts into practice free software as well as open access publishing. Perl4Liber's will enjoy it because it is pure Perl. Beginning Perl scripters may benefit most from the tutorial. Something for everyone. About the book and who should read it The book is a manual, and its purpose is to outline the principles and processes necessary to implement digital library collections and services. It uses MyLibrary as an example but the principles and processes can be applied to just about any digital library system or application. The manual is intended to be read by administrators who need to know what and how many resources to allocate to a digital library. It is intended to be read by librarians who are responsible for collecting and organizing content as well as ensuring the library's usability. The manual is intended to be read by systems administrators who are in charge of providing the technical infrastructure for the system. Last but not least, it is intended for programers who will use the underlying Perl API to provide services against the collection. What the book contains and who helped write it The book's 200+ pages is distributed in two volumes and freely available in HTML and PDF formats. Co-written by seventeen excellent authors, the book elaborates upon digital library topics including information architecture, content standards, user-centered design, fundamental computer technologies, techniques for initial implementation ongoing maintenance, and of course the MyLibrary Perl application programmer's interface. Here is an outline of the book's contents: * Designing, Implementing, and Maintaining Digital Library Services and Collections with MyLibrary by Eric Lease Morgan (University of Notre Dame) * Pioneering Portals: A History Of [EMAIL PROTECTED] by Keith Morgan (North Carolina State University) * Information architecture o First Principles of Information Architecture: On your Mark. Get set. Go! not Fire, and then Aim. by Eric Lease Morgan (University of Notre Dame) o Facets and Terms in MyLibrary by Tom Lehman (University of Notre Dame) * The Importance of Content Standards in Digital Libraries by Leslie Johnston (University of Virginia Library) * User-centered design o Usability Testing: a Key to User-centered Designs by Terry Huttenlock (Wheaton College) o Surveys by Tom Lehman (University of Notre Dame) o Focus Group Interviews by Megan Johnson (Appalachian State University) o Attracting Users by Michael Yunkin (University of Nevada, Las Vegas) o Card Sorting by Terry Nikkel and Shelley McKibbon (Dalhousie University Libraries) o Paper Prototyping by Nora Dimmock (University of Rochester) o Low-cost Recording of Usability Tests by Martin Courtois (Kansas State University) o Communicating Usability Results by Brenda Reeb (University of Rochester) o Case Studies by Hal Kirkwood (Purdue University), Leslie Johnston (University of Virginia Library), and Alison Aldrich Vishwam Annam (Wright State University Libraries) * Underlying technologies o What is XML, and Why Should I Care? by Tod Olson (University of Chicago) o What are Relational Databases, and Why Should I Care? by Vishwam Annam (Wright State University Libraries) o What are Indexers and Why Should I Care? by Peter Karman * Implementation and Maintenance by Eric Lease Morgan (University of Notre Dame) * MyLibrary Tutorial by Eric Lease Morgan (University of Notre Dame) * The MyLibrary Perl API by Robert Fox (University of Notre Dame) Colophon The book is licensed under the GNU Public License and is an example of open access publishing. Author's have retained copyrights to the things they have written. The manuscript was marked up in DocBook XML and transformed into HTML and PDF files using XSLT stylesheets, xsltproc, and fop. Questions, comments, corrections, criticisms, and clarifications are more than welcome. Send them to [EMAIL PROTECTED] -- Eric Lease Morgan and Team MyLibrary Manual
Re: dereferencing an array - Pt 2
On Feb 11, 2006, at 8:16 AM, Brad Baxter wrote: I have this sample data structure: my %profile = ( 'subjects' = { 'astronomy' = { 'telescope world' = 'http://telescope.com', 'stars r us' = 'http://websters.com', 'asto magazine' = 'http://oxford.edu' }, 'mathematics' = { '2 + 2 = 4' = 'http://catalog.nd.edu', 'math library' = 'http://worldcat.com' } }, 'tools' = { 'dictionaries' = { 'websters' = 'http://websters.com', 'oxford' = 'http://oxford.edu' }, 'catalogs' = { 'und' = 'http://catalog.nd.edu', 'worldcat' = 'http://worldcat.com' } } ); I now need to build %profile programatically. As I loop through a set of information resources I can determine the following values: 1. resource name (ex: telescope world) 2. URL (ex: http://telescope.com) 3. term (ex: astronomy) 4. facet (ex: subjects) Given these values, how can I build %profile? Short answer: $profile{ $facet }{ $term }{ $resource } = $url; Wow! Perfect!! I have been able to take what Jonathan Gorman, Bruce Van Allen, and Brad Baxter have given me and incorporate it into a the beginnings of a patron-specific interface of MyLibrary. In MyLibrary patrons can be created and cataloged with facet/term combinations -- a controlled vocabulary. These same facet/term combinations are used to catalog information resources. Thus, through the controlled vocabulary I am able to create relationships between resources and patrons. The results is the display of a set of information resources designed for individuals with particular characteristics. For example, try the following URLs. Each points to a different patron with different characteristics, and each page provides the ability to display the information resources in an alphabetical or grouped view: * Andrew Carnegie http://dewey.library.nd.edu/morgan/portal/?cmd=patronid=194 * Leonardo D'Vinci http://dewey.library.nd.edu/morgan/portal/?cmd=patronid=191 * Galileo Galilei http://dewey.library.nd.edu/morgan/portal/?cmd=patronid=193 Thanks guys. I have added your names to my code. -- Eric Morgan
dereferencing an array
How do I loop through a reference to an array? I have the following data structure: my %facets = ( 'audiences' = [('freshman', 'senior')], 'subjects' = [('music', 'history')], 'tools' = [('dictionaries', 'catalogs')] ); I can use this code to get the keys for %facets: foreach my $key (sort(keys(%facets))) { print $key, \n } But since $key points to the reference of an array, I don't know how to loop through the referenced array. -- Eric Lease Morgan Head, Digital Access and Information Architecture Department University Libraries of Notre Dame (574) 631-8604
Re: dereferencing an array
On Feb 10, 2006, at 3:58 PM, Eric Lease Morgan wrote: Now I'm going to make each value in the referenced array a reference to a hash; I'm going to make my data structure deeper. 'More later. Since that worked so well, I'll ask this question. Given the following data structure, how do I print out something like this: tools dictionaries websters - http://websters.com oxford - http://oxford.edu catalogs und - http://catalog.nd.edu worldcat - http://worldcat.com my %facets = ( 'tools' = [( 'dictionaries' = [( 'websters' = 'http://websters.com', 'oxford' = 'http://oxford.edu' )], 'catalogs' = [( 'und' = 'http://catalog.nd.edu', 'worldcat' = 'http://worldcat.com' )] )] ); This code doesn't cut it: foreach my $key (sort(keys(%facets))) { print $key, \n; foreach my $term (@{$facets{$key}}) { print \t, $term, \n; } } Is my data structure dumb? -- Eric Morgan
Re: dereferencing an array - Pt 2
On Feb 10, 2006, at 5:41 PM, Bruce Van Allen wrote: foreach my $facet_key (keys %facets) { print $facet_key\n; my %sub_hash= %{ $facets{$facet_key} }; foreach my $sub_key (keys %sub_hash) { print \t$sub_key\n; my %inner_hash= %{ $sub_hash{$sub_key} }; foreach my $inner_key (keys %inner_hash) { print \t\t$inner_key - $inner_hash{$inner_key}\n; } } } This has been VERY helpful, and I appreciate the assistance. Now I need to programatically build the hash. I have this sample data structure: my %profile = ( 'subjects' = { 'astronomy' = { 'telescope world' = 'http://telescope.com', 'stars r us' = 'http://websters.com', 'asto magazine' = 'http://oxford.edu' }, 'mathematics' = { '2 + 2 = 4' = 'http://catalog.nd.edu', 'math library' = 'http://worldcat.com' } }, 'tools' = { 'dictionaries' = { 'websters' = 'http://websters.com', 'oxford' = 'http://oxford.edu' }, 'catalogs' = { 'und' = 'http://catalog.nd.edu', 'worldcat' = 'http://worldcat.com' } } ); I use the followign code, based on the good work of Bruce, to traverse %profile and output a set of nested HTML lists. It works for any size of %profile. Fun! print ul; foreach my $facet (sort(keys(%profile))) { print li$facet; my %facets = %{$profile{$facet}}; print ul; foreach my $term (sort(keys(%{$profile{$facet}}))) { print li$term; my %terms = %{$facets{$term}}; print ol; foreach my $resource (sort(keys(%terms))) { print lia href='$facets{$term}{$resource}'$resource/ a/li; } print /ol; print /li; } print /ul; print /li; } print /ul; I now need to build %profile programatically. As I loop through a set of information resources I can determine the following values: 1. resource name (ex: telescope world) 2. URL (ex: http://telescope.com) 3. term (ex: astronomy) 4. facet (ex: subjects) Given these values, how can I build %profile? -- Eric Perl Data Structures !R My Forte Morgan
mylibrary portal
We have created an additional MyLibrary end-user interface that will be distributed with the Perl modules: http://dewey.library.nd.edu/mylibrary/portal/ The interface supports browse by facet and term, as well as search. The administrative interface allows you to add facets, terms, location types, and resources. It then provides the means to index the whole thing and make it accessible via SRU. The back-end can export the content of the database and make it available as an OAI data repository. Please bang on the Portal a bit and tell us what you think. Our next step will be to fix abnormalities and enhance the whole thing with documentation (PODs). -- Eric Lease Morgan University Libraries of Notre Dame (574) 631-8604
Re: mylibrary tutorial
I have all but finished my MyLibrary Tutorial: By the end of the tutorial the reader should be able to: create sets of facets, create sets of terms, create sets of librarians, create sets of location types, create sets of resources, classify librarians and resources with terms, work with sets of resources assoicated with particular sets of terms, output the resources' titles, descriptions and locations, create a freetext index of MyLibrary content, harvest OAI repositories and cache the content in a MyLibrary database. http://dewey.library.nd.edu/morgan/tutorial.txt We will be including this document in the upcoming MyLibrary Manual. -- Eric Lease Morgan University Libraries of Notre Dame
Re: cpan
On Jan 13, 2006, at 2:14 PM, Eric Lease Morgan wrote: Is is kosher to upload something like the MyLibrary Perl modules to CPAN, and if so where would we put it? Based on feedback I've gotten on and off list as well as from postings to the perl.module-authors mailing list/newsgroup, I think I will upload MyLibrary to cpan and create a top-level namespace. -- Eric Lease Morgan
mylibrary manual
I have gone the next step to writing a MyLibrary version 3.0 manual. See: http://dewey.library.nd.edu/mylibrary/manual/ As it stands right now, the manual covers the vast majority of the Perl API, and as soon as I figure out why my FO/PDF processor broke I will create a PDF version. FYI. -- Eric Lease Morgan Head, Digital Access and Information Architecture Department University Libraries of Notre Dame (574) 631-8604
wanted: short-term Perl programmer
This is a want-ad for a short-term Perl programmer. Please share it as you see fit. Short-term Perl programmer The University Libraries of Notre Dame Libraries is seeking an expert Perl programmer to work on a short-term project for a professional salary. Description: The Libraries is involved in a national research and development activity. One of the activity's goals is to enhance an information retrieval system with a Find More Like This One feature. This feature will: 1. Allow users to identify a desirable record from a list of search results 2. Select characteristics from the record the user deems significant 3. Return those characteristics back to the system 4. The system will then use things like locally created dictionaries, WordNet, and/or other semantic tools to return additional searches to be applied against other internal or external indexes Requirements: The successful candidate must have exceptional skills in reading and writing object oriented Perl programs in a Unix/Linux environment. The position requires the candidate to be able document their code with comments as well as in the form of PODs. The positon requires the candidate to be able to work in a collaborative environment. Thus, the candidate must posess well-developed communication skills. Highly desireable: Applicants who demonstrate an understanding of relational database techniques, XML and Web Services, academia, as well as the principles of open source software will be given preference. Work environment: The University Libraries is located in Notre Dame, IN (just outside South Bend) about ninety miles east of Chicago. Because of the location, telecommuting is possible, but regular weekly site visits are necessary. Start date: Immmediately End date: No later than August 31, 2005 Salary: Starting at $24/hour and negotiable depending on qualifications, experience, and flexibility Application: Send cover letters, resumes, and questions to Eric Lease Morgan ([EMAIL PROTECTED]). All inquires will be acknowledged. -- Eric Lease Morgan Head, Digital Access and Information Architecture Department University Libraries of Notre Dame (574) 631-8604
option items sorted in pop-up menus
Is there any way to use CGI.pm and still have option items sorted in pop-up menus? CGI.pm provides cook ways to create HTML form. To create a pop-up menu, I can do something like this: # create a hash of terms my @terms = MyLibrary::Term-get_terms(sort = 'name'); my %terms; foreach (@terms) { $terms{$_-term_id} = $_-term_name } ... $html .= $cgi-popup_menu(-name = 'id', -values = \%terms); where: -name is the name of the parameter I want returned -values is a reference to a hash containing id and value pairs. The problem is that -values is a hash, and when CGI.pm displays the pop-up menu the items do not come out sorted. (They were inserted into the hash in a sorted order.) Is there some I can get CGI.pm to output the popup items in sorted order, or should I write my own little function to do this for me? -- Eric Lease Morgan University Libraries of Notre Dame
Re: adding a MARC tag called SYS
On Jul 30, 2004, at 8:56 AM, Eric Lease Morgan wrote: Besides the fact that the addition of a field named SYS may be a feature of my integrated library system, how can I add such a field to my data? Well, now the whole thing is a moot point in my book. Instead of using a kewl SYS field in my records, I stuffed my data into 035 subfield y. 'Sorry for the fuss. -- Eric Lease Morgan University Libraries of Notre Dame
adding a MARC tag called SYS
How can I add a MARC tag called SYS to a set of MARC records? I want to loop through a set of MARC records, extract the last nine characters of the 001 field, add a new field to each record called SYS, and output the resulting data to a new file. I have the following code snippet that does the work: # read each record while (my $record = $batch-next()) { # get 001 field my $field = $record-field('001')-as_string; # create a system number, the last nine characters of $field my $sysno = substr $field, -9, 9; # add a sys tag $record-append_fields(MARC::Field-new('SYS', $sysno)); # write to STDOUT print $record-as_usmarc(); } Alas, MARC::Field says I need to include a subfield in the SYS field, but that is not what I want. I want the SYS field to contain no indicators nor subfields. I want it to be just like normal MARC tags with values less than 010. Besides the fact that the addition of a field named SYS may be a feature of my integrated library system, how can I add such a field to my data? -- Eric Lease Morgan Head, Digital Access and Information Architecture Department University Libraries of Notre Dame (574) 631-8604
Re: xml::libxslt [resolved]
On Jul 14, 2004, at 11:01 AM, Randy Kobes wrote: Second, Perl return this in the terminal: Can't load 'C:/Perl/site/lib/auto/XML/LibXSLT/LibXSLT.dll' for module XML::LibXSLT: load_file:The specified procedure could not be found at C:/Perl/lib/DynaLoader.pm line 230. at C:\xml\bin\xsltproc.pl line 16 Compilation failed in require at C:\xml\bin\xsltproc.pl line 16. BEGIN failed--compilation aborted at C:\xml\bin\xsltproc.pl line 16. Where can I get another copy of XML::LibXSLT for ActiveState ActivePerl? I'm not aware of one ... This problem usually means that your system has a version of the external dlls needed by the Perl modules that is incompatible with the dlls the Perl modules were compiled with. What I would suggest is to uninstall XML-LibXML, XML-LibXML-Common, and XML-LibXSLT, and then reinstall them, making sure that the post-install scripts run by XML-LibXML-Common (to install libxml2.dll) and XML-LibXSLT (to install libxslt-related dlls) are successfully run. If the post-install scripts do find a copy of the needed dlls on your system, have it fetch and install them anyway, just to make sure the correct version of the dlls are available. And adjust your PATH environment variable to make sure the directory these dlls live in is searched before other directories that may contain other versions of the dlls. Thank you. This is what I needed to know. After I got rid of the extraneous *.dll files (specifically libxml2.dll installed by Swish-e) I was able to run scripts written with XML::LibXML and XML::LibXSLT successfully. What's more, my swish-e programs still work as desires. Whew! Thank you, and the open source software + mailing list combination comes through yet again. -- Eric Lease Morgan University Libraries of Notre Dame (574) 631-8604
xml::libxslt on windows
Have you gotten XML::LibXSLT to work on Windows, and if so, then how? I have installed ActiveState Perl version 5.8.4 build 810 on my Windows computer. It resides in c:\Perl. I configured ppm to point to an additional repository at http://theoryx5.uwinnipeg.ca/ppms/ where I found XML::LibXML ppm files. I got XML::LibXML to work just fine, but XML::LibXSLT fails like this: Can't load 'C:/Perl/site/lib/auto/XML/LibXSLT/LibXSLT.dll' for module XML::LibXSLT: load_file:The specified procedure could not be found at C:/Perl/lib/DynaLoader.pm line 230. at C:\xml\bin\xsltproc.pl line 16 Compilation failed in require at C:\xml\bin\xsltproc.pl line 16. BEGIN failed--compilation aborted at C:\xml\bin\xsltproc.pl line 16. Windows also spits this out in a dialog box: The procedure entry point xmlDictCreateSub could not be located in the dynamci link library libxml2.dll Do y'all have any hints on how I can resolve this problem? -- Eric Lease Morgan University Libraries of Notre Dame (574) 631-8604
RE: using xml::libxml to find replace in xml documents
I wrote: Has anybody here written one or more Perl scripts using XML::LibXML to find replace in XML documents?... Thank you the all the replies. One person recommending a Perl module called XML::Twig. Another person recommended I use regular expressions. Two people recommended the use of XSLT. One of these provided sample code. The other wrote a full-blown program! (Thanks, Andrew Houghton!) In the end I re-read some of my Perl/XML books and decided to write a SAX filter using XML::SAX::ParserFactory. Such a filter has the following shape: use strict; use XML::SAX::ParserFactory; my $handler = MyHandler-new(); my $parser = XML::SAX::ParserFactory-parser(Handler = $handler); $parser-parse_uri($ARGV[0]); exit; package MyHandler; sub new { my $type = shift; return bless {}, $type; } sub start_element { my ($self, $element) = @_; print Starting element $element-{Name}\n; } sub end_element { my ($self, $element) = @_; print Ending element $element-{Name}\n; } sub characters { my ($self, $characters) = @_; print characters: $characters-{Data}\n; } 1; I have saved my script at the following location: http://infomotions.com/musings/getting-started/fix-ead.txt The script will eventually be a part of a workshop I am giving called Shining a LAMP on XML. the outline, to date, is here: http://infomotions.com/musings/getting-started/LAMP.txt 'More later. -- Eric Lease Morgan
using xml::libxml to find replace in xml documents
Has anybody here written one or more Perl scripts using XML::LibXML to find replace in XML documents? I have a set of 700 XML files. Each one has an incorrect attribute value in a processing instruction, a few invalid attributes in a particular element, and a set of elements that are no longer valid against the DTD. I want to use XML::LibXML to clean up these files, and I'm hope someone out there has already done this to some extent and can share their code. While the XML::LibXML modules are very functional, I wish they had more examples in their PODs. -- Eric Lease Morgan University Libraries of Notre Dame
berkeleydb xml
Does anybody here have any experience with BerkeleyDB XML? I have finally gotten it to compile and installed, and I'm intrigued with the idea of native XML database, especially combined with the use of Perl and its included Perl API. For more information see: http://www.sleepycat.com/xmldocs/ref_xml/toc.html -- Eric Lease Morgan University Libraries of Notre Dame
STDIN as well as command line input
How do I get a Perl program to accept input from STDIN as well as command line input. I have a program (foo.pl) that is designed to read the contents of @ARGV and process each item in the array. Tastes great. Less filling. So, when I do something like this, things work just fine: %foo.pl a b c d e f g I have another program (bar.pl) that prints to STDOUT. The output is the same sort of data needed by foo.pl. So, I thought I'd give this is a whirl: %bar.pl | foo.pl But alas, foo.pl never seems to get the input sent from bar.pl. It does not seem to read from STDIN. What should I do to my first program (foo.pl) so it can accept command line input as well as input from STDIN? -- Eric Lease Morgan (574) 631-8604
Re: STDIN as well as command line input
On Apr 26, 2004, at 10:43 AM, Andy Lester wrote: How are you reading from the files? Opening them yourself one at a time? Don't. Use the magic filehandle. On Apr 26, 2004, at 10:44 AM, Dennis Boone wrote: If your perl script is structured like this: while () { # process } then perl will process stdin if no files are named, or the contents of each file named on the command line in sequence. Alas, my inputs are not the names of files. They are scalars, like this: plato-cratylus-1072532262 plato-charmides-1072462708 bacon-new-1072751992 -- Eric (574) 631-8604
Re: STDIN as well as command line input
On Apr 26, 2004, at 10:53 AM, Michael McDonnell wrote: This sort of situation can be dealt with with back ticks: foo.pl `bar.pl` This is nice in that you can probably do this too: foo.pl a b c `bar.pl` d e f g h `bar.pl x y z` i j k A popular GNUism might be helpful here as well. Many GNU programs use an option command line argument of -- to indicate that input should be taken from STDIN instead of from other command line arguments. The back ticks solutions works well. Thank you. I will see about modifying my code to get smart about -- arguments. Again, thanks. -- Eric (574) 631-8604
Re: automagically create browsable POD pages [pod2html]
On Apr 1, 2004, at 10:33 AM, Eric Lease Morgan wrote: Is there some sort of make command I can run that will read the PODs in my distribution, turn them into (X)HTML files, and save them in a specified local directory of my distribution's filesystem? Thank you for the prompt replies, but the suggestions are overkill. I simply want to: 1. create a doc directory 2. loop through my lib directory looking for pods 3. convert each pod to xhtml 4. save converted files to the pod directory I think I will write a local wrapper to pod2html. BTW, pod2html looks like it will already do this, but I can't figure out how to make it: 1. create a single xhtml file for each pod 2. give each file a specific name Yeah, I can't do this pod by pod, but I'm lazy. -- Eric Morgan
Re: automagically create browsable POD pages [pod2html]
On Apr 2, 2004, at 6:55 AM, Eric Lease Morgan wrote: Thank you for the prompt replies, but the suggestions are overkill. I simply want to: 1. create a doc directory 2. loop through my lib directory looking for pods 3. convert each pod to xhtml 4. save converted files to the pod directory Like this, but there has got to be a better way: #!/usr/bin/perl use File::Basename; use File::Find; my $POD2HTML = 'pod2html'; my $IN = $ARGV[0]; my $OUT = $ARGV[1]; find (\process_files, $IN); exit; sub process_files { # get the name of the found file my $file = $File::Find::name; # make it has the correct extension next if ($file !~ m/\.pm$/); # extract the necessary parts of the file name (my $name, my $path, my $suffix) = File::Basename::fileparse($file, '\..*'); my $cmd = $POD2HTML . --outfile=$OUT/$name.html --title=$name $file; print $cmd\n; system $cmd; } -- Eric Lease Morgan
automagically create browsable POD pages
Is there some sort of incantation I can send to a Perl-generated Makefile in order to automagically create browsable POD pages? Here at MyLibrary Central we have been re-writing MyLibrary. We are using the following technique: 1. Write POD. 2. Write tests. 3. Write module. 4. Go to Step #1 until tests pass and module is complete. 5. Write scripts using module. The process works great, and that is an understatement. Through this process we have created bunches o' PODs, and we want to share them with the world. Release early. Release often. Is there some sort of make command I can run that will read the PODs in my distribution, turn them into (X)HTML files, and save them in a specified local directory of my distribution's filesystem? -- Eric Morgan University Libraries of Notre Dame (574) 631-8604
net::z3950
If I have installed ActiveState Perl on my Windows computer, then how do I install net::z3950? When I run ppm and then type 'install net-z3950' I get a message saying that 45 bytes were downloaded, things were successful, but there are no new modules in my path. Has anybody here installed Net::Z3950 on Windows? -- Eric Lease Morgan Head, Digital Access and Information Architecture Department University Libraries of Notre Dame (574) 631-8604
Really Rudimentary Catalog
On 1/5/04 11:34 AM, Eric Lease Morgan [EMAIL PROTECTED] wrote: My book catalog excels at inventorying my collection. It does a very poor job at recommending/suggesting what book(s) to use. The solution is not with more powerful search features, nor is it with bibliographic instruction. The solution is lies in better, more robust data, as well as access to the full text. This is not just a problem with my catalog. It is a problem with online public access catalogs everywhere, but I deviate. I'm off topic. All of this is fodder for my book catalog's About text. I have packaged up my implementation of book catalog (Really Rudimentary Catalog), made the Perl source code available, and re-articulated my ideas about the limitations of traditional library catalogs here in the system's About text: http://infomotions.com/books/?cmd=about -- Eric Lease Morgan University Libraries of Notre Dame
Re: Net::Z3950 and diacritics [book catalogs]
On 12/16/03 8:57 AM, Eric Lease Morgan [EMAIL PROTECTED] wrote: Upon further investigation, it seems that MARC::Batch is not necessarily causing my problem with diacritics, instead, the problem may lie in the way I am downloading my records using Net::Z3950 Thank you to everybody who replied to my messages about MARC data and Net::Z3950. I must admit, I still don't understand all the issues. It seems there are at least a couple of character sets that can be used to encode MARC data. The characters in these sets are not always 1 byte long (specifically the characters with diacritics), and consequently the leader of my downloaded MARC records was not always accurate, I think. Again, I still don't understand all the issues, and the discrepancy is most likely entirely my fault. I consider my personal catalog about 80% complete. I have about another 200 books to copy catalog, and I can see a few more enhancements to my application, but they will not significantly increase the system's functionality. I consider those enhancements to be featuritis. Using my Web browser I can catalog about two books per minute. In any event, the number of book descriptions from my personal catalog containing diacritics is very small. Tiny. Consequently, my solution was to either hack my MARC records to remove the diacritic or skip the inclusion of the record all together. The process of creating my personal catalog was very enlightening. The MARC records in my catalog are very very similar to the records found in catalogs across the world. My catalog provides author, title, and subject searching. It provides Boolean logic, nested queries, and right-hand truncation. The entire record is free-text searchable. Everything is accessible. The results can be sorted by author, title, subject, and rank (statistical relevance). A cool search is a search for cookery: http://infomotions.com/books/?cmd=searchquery=cookery Yet, I still find the catalog lacking, and what it is lacking is/are three things: 1) more descriptive summaries like abstracts, 2) qualitative judgments like reviews and/or the number of uses (popularity), and 3) access to the full text. These are problems I hope to address in my developing third iteration of my Alex Catalogue: http://infomotions.com/alex2/ My book catalog excels at inventorying my collection. It does a very poor job at recommending/suggesting what book(s) to use. The solution is not with more powerful search features, nor is it with bibliographic instruction. The solution is lies in better, more robust data, as well as access to the full text. This is not just a problem with my catalog. It is a problem with online public access catalogs everywhere, but I deviate. I'm off topic. All of this is fodder for my book catalog's About text. Again, thank you for the input. -- Eric Lease Morgan University Libraries of Notre Dame
Re: Extracting data from an XML file
I wrote: Can you suggest a fast, efficient way to use Perl to extract selected data from an XML file?... First of all, thank you everyone who promptly replied to my query. Second, I was not quite clear in my question. Many people said I should write an XSLT style sheet to transform my XML document into HTML. This is in fact what I do, but I was not clear in my question. I need a process to not only transform each of my documents, but I also need to create an author as well as title indexes to my collection, and therefore I need to extract bits of data from each of my original XML files. Third, most of the replies fell into two categories: 1) use an XSLT style sheet as as sort of subroutine, and 2) use XML::Twig. Fourth, I tried both of these approaches plus my own, and timed them. I had to process 1.5 MB of data in nineteen files. Tiny. Ironically, my original code was the fastest at 96 seconds. The XSLT implementation came in second at 101 seconds, and the XML::Twig implementation, while straight-forward came in last as 141 seconds. (See the attached code snippets.) Since my original implementation is still the fastest, and the newer implementations do not improve the speed of the application, then I must assume that the process is slow because of the XSLT transformations themselves. These transformations are straight-forward: # transform the document and save it my $doc = $parser-parse_file($file); my $results = $stylesheet-transform($doc); my $html_file = $HTML_DIR/$id.html; open OUT, $html_file; print OUT $stylesheet-output_string($results); close OUT; # convert the HTML to plain text and save it my $html = parse_htmlfile($html_file); my $text_file = $TEXT_DIR/$id.txt; open OUT, $text_file; print OUT $formatter-format($html); close OUT; When my collection grows big I will have to figure out a better way to batch transform my documents. I might even have to break down and write a shell script to call xsltproc directly. (Blasphemy!) -- Eric Lease Morgan University Libraries of Notre Dame subroutines.txt Description: application/applefile # my original code print Processing $file...\n; my $doc= $parser-parse_file($file); my $root = $doc-getDocumentElement; my @header = $root-findnodes('teiHeader'); my $author = $header[0]-findvalue('fileDesc/titleStmt/author'); my $title = $header[0]-findvalue('fileDesc/titleStmt/title'); my $id = $header[0]-findvalue('fileDesc/publicationStmt/idno'); print author: $author\n title: $title\n id: $id\n\n; # using an XSLT stylesheet print Processing $file...\n; my $style = $parser-parse_file($AUTIID); my $stylesheet = $xslt-parse_stylesheet($style); my $doc= $parser-parse_file($file); my $results= $stylesheet-transform($doc); my $fullResult = ($stylesheet-output_string($results)); my @fullResult = split /#/, $fullResult; my $title = $fullResult[0]; my $author = $fullResult[1]; my $id = $fullResult[2]; print author: $author\n title: $title\n id: $id\n\n; # using XML::Twig print Processing $file...\n; my ($author, $title, $id); my $twig = new XML::Twig(TwigHandlers = { 'teiHeader/fileDesc/titleStmt/author' = sub {$author = $_[1]-text}, 'teiHeader/fileDesc/titleStmt/title' = sub {$title = $_[1]-text}, 'teiHeader/fileDesc/publicationStmt/idno' = sub {$id = $_[1]-text}}); $twig-parsefile($file); print author: $author\n title: $title\n id: $id\n\n;
Re: constructing a Z39.50 search
On 12/7/03 10:04 PM, Eric Lease Morgan [EMAIL PROTECTED] wrote: Can you tell me how to construct a particular Z39.50 query? Specifically, how do I create a Library of Congress card number search or a MARC tag 001 search? [EMAIL PROTECTED] wrote: In order to construct a Z39.50 query for LC control number (e.g., 2002012345), transmit the Use attribute value 9. This will map to appropriate internal searches. We support either a keyword or a left-anchored LCCN search. Therefore, your intersite query will be supported if it contains only the Use attribute (we'll default to a keyword search), or you could also include additional attributes. Several examples follow: Use Relation Position Structure Truncation Completeness -- - -- 9none none none none none 9 3 3 2 100 1 9 3 1 11 1 9none 3none none none 9none 1none none none With regard to the MARC 001 search (i.e., local control number -- Voyager record ID), our server supports this if you transmit Use attribute value 12. *HOWEVER*, there is a bug in Voyager servers that you need to know about. If your application maintains a connection (i.e., is not stateless), Use attribute 12 will break subsequent searches in the session. The record requested by the local control number search (e.g., 123456) will be returned correctly, but all subsequent searches in the session will return the same record. I'd recommend using the LCCN search instead. Larry E. DixsonInternet:[EMAIL PROTECTED] Network Development and MARC Standards Office, LM639 Library of CongressTelephone: (202) 707-5807 Washington, D.C. 20540-4402 Fax: (202) 707-0115 Larry's answer was perfect. All I have do to is hack my script like this for LC Card Number search: # initialize a z39.50 query my $query = [EMAIL PROTECTED] 1=9 . pop(); ^^^ Or this for a Voyager control number (key) search: # initialize a z39.50 query my $query = [EMAIL PROTECTED] 1=12 . pop(); To make this happen I will add a couple of switches to the command-line input, and I'm set. Cool! These new fangled Internet and mailing list things work great!! -- Eric Lease Morgan University Libraries of Notre Dame P.S. To whomever passed my original query along to Larry, I say, Thank you.
constructing a Z39.50 search
Can you tell me how to construct a particular Z39.50 query? Specifically, how do I create a Library of Congress card number search or a MARC tag 001 search? I am writing a simple online public access catalog. For a good time try: http://infomotions.com/opac/?cmd=searchquery=civil* http://infomotions.com/opac/?cmd=searchquery=perl To build my catalog I have written a command-line driven, brain-dead acquisition application based on the the MARC::Record tutorials. The acquisition application takes ISBN numbers as input, searches them against the Library of Congress's database, returns MARC records, and saves them accordingly. My application contains this snippet of code: # check for command line input, a string of ISBN numbers exit if ($#ARGV 0); # initialize a z39.50 query my $query = [EMAIL PROTECTED] 1=7 . pop(); # populate the query with ISBN numbers while (@ARGV) { $query = '@or @attr 1=7 ' . pop() . $query } This process works just fine, but every once in a while the queries fail because more than one record matches an ISBN number. Consequently, I need to be more specific. Maybe I could build a MARC tag 001 search. What does the 001 field contain? Is it normally indexed? If so, how can I create a $query to specify a search against this field? At the same time ISBN numbers sometimes fail because they are not found in the data at all. In these cases I want to create a LC card number search. Can somebody tell me how to create $query to specify a card search? -- Eric Lease Morgan University Libraries of Notre Dame
Re: test for an array
Thank you for the prompt replies. On 9/30/03 5:18 PM, Andy Lester [EMAIL PROTECTED] wrote: Rather than thinking of it in terms of what the function should return, think of it as what you're expecting. For example, you might: my @vars = $librarian-term_ids( ... ); is( scalar @vars, 3, Got back three items ); is( $vars[0], Smith, Checking name ); is( $vars[1], HR128, Checking homeroom ); ok( !$vars[2], Checking that Smith is NOT the principal ); I can implement this, and think of what you're expecting makes a lot of sense. Thank you. -- Eric Morgan