catmandu, catmandu::fedoracommons, mods::record,

2013-08-07 Thread Eric Lease Morgan

The three (Perl) modules described below look pretty cool, as well as kewl -- 
an interface to read/write MODS, an interface to interact with Fedora Commons, 
an interface to convert data from one thing to another. I can see how these can 
be useful tools in some of my work. Thank you. --Eric Lease Morgan


On Aug 6, 2013, at 2:59 AM, Patrick Hochstenbach 
patrick.hochstenb...@ugent.be wrote:

 LibreCat
 -=-=-=-=
 
 LibreCat is an open collaboration of the university libraries of Lund,
 Ghent, and Bielefeld to create tools for library and research services.
 One of the toolkits we provide is called 'Catmandu'
 (http://search.cpan.org/~nics/Catmandu-0.5004/lib/Catmandu.pm) which is
 a suite of tools to do ETL processing on library data. We provide tools
 to import data via JSON, YAML, CSV, MARC, SRU, OAI-PMH and more. To
 transform this data we created a small DSL language that librarians use
 in our institutions. Also we make it very easy to store the results in
 MongoDB, ElasticSearch, Solr or export it into various formats.
 
 We create also command line tools because we felt that in our daily jobs
 we were creating the same type of adhoc Perl scripts over and over for
 endless reports. 
 
 E.g. to create a CSV file of all titles in a MARC export we say something 
 like:
 
 $ catmandu convert MARC to CSV --fix 'marc_map(245,title); 
 retain_field(record);'  records.mrc
 
 To get all titles from our institutional repository we say:
 
 $ catmandu convert OAI --url http://biblio.ugent.be/oai  to JSON --fix 
 'retain_field(title)'
 
 To store a MARC export into a MongoDB we do:
 
 $ catmandu import MARC to MongoDB --database_name mydb --bag data  
 records.mrc
 
 Here is a blog post about the commands that are available: 
 http://librecat.org/catmandu/2013/06/21/catmandu-cheat-sheet.html
 
 See our project page for more information about LibreCat and Catmandu : 
 
 http://librecat.org
 
 and a tutorial how to work with the API
 
 http://librecat.org/tutorial/
 
 
 MODS::Record
 -=-=-=-=-=-=
 
 In one of our Catmandu  projects we created a Perl connector for Fedora
 Commons (http://search.cpan.org/~hochsten/Catmandu-FedoraCommons-0.24).
 One of our goals was to integrate better with the Islandora project. For
 this we needed a  Perl MODS parser. As there was no module available on
 CPAN we provide a top level module like MARC::Record called MODS::Record
 http://search.cpan.org/~hochsten/MODS-Record-0.05/lib/MODS/Record.pm. I
 hope this will be of some help for the community. If there are coders
 here who would like to contribute to the MODS package please drop me a
 line. I think CPAN MODS support shouldn't be dependent on one coder, one
 institution.
 
 Greetings from a sunny Belgium,
 Patrick




Re: reading and writing of utf-8 with marc::batch [resolved; gigo]

2013-03-28 Thread Eric Lease Morgan

Thank you for all the input, and I think I have resolved my particular issue. 
Battle won. War still raging.

Using the script suggested by Galen as an starting point, I wrote the following 
hack outputting integers denoting MARC records containing non-UTF-8 characters, 
but the script output nothing; all the data in all of my records was encoded as 
UTF-8:

  #!/usr/bin/perl

  # require
  use strict;
  use Encode;

  # initialize
  binmode STDIN, :bytes;
  $/= \035; 
  my $i = 0;

  # read STDIN
  while (  ) {

  # increment
  $i++;

  # check validity
  eval { my $utf8str = Encode::is_utf8( $_, Encode::FB_CROAK ); };

  # check for error
  if ( $@ ) { print Record $i contains non-UTF-8 characters\n; }

  }

  # done
  exit;


Since all of the data in all of my records was UTF-8, then all of the leaders 
of all of the records need to have a value of a set in position #9 of the 
leader. So I wrote the following hack (circumventing MARC::Batch):

  #!/usr/bin/perl

  # require
  use strict;

  # initialize
  binmode STDIN,  :bytes;
  binmode STDOUT, :bytes;
  $/ = \035; 

  # loop through the input
  while (  ) {

  # do the work and output
  substr( $_, 9, 1 ) = a;
  print $_;

  }

  # done
  exit;


I then fed the output of my fix routine to my indexing routing, and all of my 
problems seemed to go away. GIGO?

I'm still not sure, but I think deep within MARC::Batch some sort of encoding 
is observed, honored, and output. And when the denoted encoding is not true and 
things like binmode( FILE, :utf8 ) get called, output gets munged. Again, I'm 
not sure. It is almost exhausting.


-- 
Eric Morgan
University of Notre Dame








Re: reading and writing of utf-8 with marc::batch [terminal]

2013-03-27 Thread Eric Lease Morgan

On Mar 26, 2013, at 5:57 PM, Leif Andersson leif.anders...@sub.su.se wrote:

 my first guess would be your terminal is not utf8.

While I'm not positive my terminal is doing UTF-8, I think it is. When I dump 
in the beginning the output to the terminal is correct. After I run my script 
the output to the same terminal is incorrect. 

--
Eric Lease Morgan



Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Eric Lease Morgan

A number of people have alluded to the problem of double encoding, and I'm 
beginning to think this is true. 

I have isolated a number of problem records. They all contain diacritics, but 
they do not have an a in position #9 of the leader -- 
http://dh.crc.nd.edu/tmp/original.marc  Can someone verify that the file 
contains UTF-8 characters for me?

For these same records I have also added an a in position #9 and created a 
similar file -- http://dh.crc.nd.edu/tmp/fixed.marc  

Is it true that original.marc is not denoted correctly, but fixed.marc is 
denoted correctly?

-- 
Eric Morgan



Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Eric Lease Morgan

On Mar 27, 2013, at 4:59 PM, Eric Lease Morgan emor...@nd.edu wrote:

 When it calls as_usmarc, I think MARC::Batch tries to honor the value set in 
 position #9 of the leader. In other words, if the leader is empty, then it 
 tries to output records as MARC-8, and when the leader is a value of a, it 
 tries to encode the data as UTF-8.

How can I figure out whether or not a MARC record contains ONLY characters from 
the UTF-8 character set?

Put another way, how can I determine whether or not position #9 of a given MARC 
leader is accurate? If position #9 is an a, then how can I read the balance 
of the record to determine whether or not all the characters really and truly 
are UTF-8 encoded?

--
Eric This Is Almost Too Much For Me Morgan



senior programmer analyst

2006-05-31 Thread Eric Lease Morgan


Notre Dame is hiring a Senior Programmer Analyst, and if you have any  
questions about the position do not hesitate to drop me a line.



Job Description

The Digital Access and Information Architecture Department of the  
University Libraries of Notre Dame is seeking a Senior Programmer  
Analyst. This position will have three types of responsibilities: 1)  
write and maintain object-oriented Perl programs, 2) provide systems  
administration services for a number of Linux-based platforms, and 3)  
actively participate in the general workings of the Department.


The goal of the Department is to help the Libraries implement digital  
library collections and services. Some of the short-term projects  
this position provides support for may include: creating a portal  
for Catholic research materials, enhancing services applied against  
an institutional repository, supporting the campus-wide search  
engine, implementing a method for creating and disseminating the  
content of TEI files, supporting a local LOCKSS host, exploiting Web  
Services-based computing to acquire and disseminate information,  
developing and supporting an open source software digital library  
system called MyLibrary. The ability to write Web-based computer  
programs in object-oriented Perl is a must but other programming  
languages are desirable. The successful candidate must also know how  
to design a (MySQL) relational database, write a valid XML file given  
a DTD and/or XML schema, and securely administrate small- to mid- 
sized computers running the Linux operating system.


The Department is small, project-oriented, and collaborative in  
nature. The successful candidate must also possess good written and  
oral communication skills exemplified by the ability to listen and  
share ideas in many forms.



Minimum Qualifications

This position must have a college degree in Computer Science (or in a  
related field) and/or demonstrated professional experience. The  
person filling this position must have the ability to:


  * Write computer applications in object-oriented Perl.

  * Draw an entity-relationship diagram and therefore have the
ability to normalize a relational database and execute SQL
queries against a relational database.

  * Write well-formed and valid XML given an XML schema or DTD.

  * Listen to other people.

  * Communicate effectively in written and oral forms.

  * Work in a collaborative environment.

  * Troubleshoot and resolve critical hardware/software problems
outside normal business hours.

The incumbent must have at least three years of programming  
experience while working in a collaborative environment and have  
demonstrated their ability to document their efforts in written and  
oral form.



Preferred Qualifications

The ability to write computer programs in languages other than Perl  
(such as Javascript, Java, PHP, or C) is desirable. People who have  
worked in libraries or academia are encouraged to apply. A knowledge  
of the open source software development process is as plus.



Position Pay Range - $3,560-$5,980/Month


To apply go http://jobs.nd.edu and search for requisition number  
020060267. AA/EOE


--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame

(574) 631-8604




open source tools and xml

2006-04-18 Thread Eric Lease Morgan


I will be facilitating a half-day tutorial on open source software  
and XML at the upcoming Joint Conference on Digital Libraries (JCDL),  
and I thought some people here might want to attend. JCDL has the  
reputation for being a quality conference, and Chapel Hill (North  
Carolina) is a nice place to visit.



  Title
Tutorial 11: Exploiting open source tools to create, maintain,
  and disseminate XML content

  Abstract
XML is quickly becoming the means of marking up data for the
  purposes of transmitting information from one computer to
  another. While XML can be created by hand, the process is tedious
  and not necessarily scalable. Software systems can address this
  problem, and this tutorial enumerates, describes, and
  demonstrates ways open source software can be used to create,
  maintain, and disseminate XML. The goal of this tutorial is to
  increase participants' knowledge of these tools and to
  demonstrate how to take advantage of them in everyday digital
  library work and software development.

  Target Audience
Software engineers and librarians/intermediate

  Presenter
Eric Lease Morgan is the Head of the Digital Access and
  Information Architecture Department at the University Libraries
  of Notre Dame. He considers himself to be a librarian first and a
  computer user second. His professional goal is to discover new
  ways to use computers to provide better library service. Some of
  his more well-known investigations and implementations include
  MyLibrary and the Alex Catalogue of Electronic Texts. An advocate
  for open source software and open access publishing, Morgan has
  been freely distributing his software and publications for years
  before the terms open source and open access were coined.
  Morgan also hosts his own Internet domain, infomotions.com.

  http://jcdl2006.org/program/afternoon-tutorials


--
Eric Morgan
University Libraries of Notre Dame



mylibrary manual

2006-03-22 Thread Eric Lease Morgan



I am happy and proud to announce the availability of the newest  
version of the MyLibrary manual called Designing, Implementing, and  
Maintaining Digital Library Services and Collections with MyLibrary.  
See:


  http://dewey.library.nd.edu/mylibrary/manual/

Code4Liber's will enjoy it because the principles it puts forth can  
be applied to many digital library settings. OSS4Libers' will enjoy  
it because it puts into practice free software as well as open  
access publishing. Perl4Liber's will enjoy it because it is pure  
Perl. Beginning Perl scripters may benefit most from the tutorial.  
Something for everyone.



About the book and who should read it

The book is a manual, and its purpose is to outline the principles  
and processes necessary to implement digital library collections and  
services. It uses MyLibrary as an example but the principles and  
processes can be applied to just about any digital library system or  
application.


The manual is intended to be read by administrators who need to know  
what and how many resources to allocate to a digital library. It is  
intended to be read by librarians who are responsible for collecting  
and organizing content as well as ensuring the library's usability.  
The manual is intended to be read by systems administrators who are  
in charge of providing the technical infrastructure for the system.  
Last but not least, it is intended for programers who will use the  
underlying Perl API to provide services against the collection.



What the book contains and who helped write it

The book's 200+ pages is distributed in two volumes and freely  
available in HTML and PDF formats. Co-written by seventeen excellent  
authors, the book elaborates upon digital library topics including  
information architecture, content standards, user-centered design,  
fundamental computer technologies, techniques for initial  
implementation  ongoing maintenance, and of course the MyLibrary  
Perl application programmer's interface. Here is an outline of the  
book's contents:


  * Designing, Implementing, and Maintaining Digital Library
Services and Collections with MyLibrary by Eric Lease Morgan
(University of Notre Dame)

  * Pioneering Portals: A History Of [EMAIL PROTECTED] by
Keith Morgan (North Carolina State University)

  * Information architecture

o First Principles of Information Architecture: On
  your Mark. Get set. Go! not Fire, and then Aim. by
  Eric Lease Morgan (University of Notre Dame)

o Facets and Terms in MyLibrary by Tom Lehman
  (University of Notre Dame)

  * The Importance of Content Standards in Digital Libraries
by Leslie Johnston (University of Virginia Library)

  * User-centered design

o Usability Testing: a Key to User-centered Designs by
  Terry Huttenlock (Wheaton College)

o Surveys by Tom Lehman (University of Notre Dame)

o Focus Group Interviews by Megan Johnson (Appalachian
  State University)

o Attracting Users by Michael Yunkin (University of
  Nevada, Las Vegas)

o Card Sorting by Terry Nikkel and Shelley McKibbon
  (Dalhousie University Libraries)

o Paper Prototyping by Nora Dimmock (University of
  Rochester)

o Low-cost Recording of Usability Tests by Martin
  Courtois (Kansas State University)

o Communicating Usability Results by Brenda Reeb
  (University of Rochester)

o Case Studies by Hal Kirkwood (Purdue University),
  Leslie Johnston (University of Virginia Library), and
  Alison Aldrich  Vishwam Annam (Wright State
  University Libraries)

  * Underlying technologies

o What is XML, and Why Should I Care? by Tod Olson
  (University of Chicago)

o What are Relational Databases, and Why Should I Care?
  by Vishwam Annam (Wright State University Libraries)

o What are Indexers and Why Should I Care? by Peter
  Karman

  * Implementation and Maintenance by Eric Lease Morgan
(University of Notre Dame)

  * MyLibrary Tutorial by Eric Lease Morgan (University of
Notre Dame)

  * The MyLibrary Perl API by Robert Fox (University of
Notre Dame)


Colophon

The book is licensed under the GNU Public License and is an example  
of open access publishing. Author's have retained copyrights to the  
things they have written. The manuscript was marked up in DocBook XML  
and transformed into HTML and PDF files using XSLT stylesheets,  
xsltproc, and fop.


Questions, comments, corrections, criticisms, and clarifications are  
more than welcome. Send them to [EMAIL PROTECTED]


--
Eric Lease Morgan and Team MyLibrary Manual



Re: dereferencing an array - Pt 2

2006-02-11 Thread Eric Lease Morgan


On Feb 11, 2006, at 8:16 AM, Brad Baxter wrote:


I have this sample data structure:

   my %profile = (
 'subjects' = {
   'astronomy' = {
 'telescope world' = 'http://telescope.com',
 'stars r us' = 'http://websters.com',
 'asto magazine' = 'http://oxford.edu'
   },
   'mathematics' = {
 '2 + 2 = 4' = 'http://catalog.nd.edu',
 'math library' = 'http://worldcat.com'
   }
 },
 'tools' = {
   'dictionaries' = {
 'websters' = 'http://websters.com',
 'oxford' = 'http://oxford.edu'
   },
   'catalogs' = {
 'und' = 'http://catalog.nd.edu',
 'worldcat' = 'http://worldcat.com'
   }
 }
   );

I now need to build %profile programatically. As I loop through a set
of information resources I can determine the following values:

   1. resource name (ex: telescope world)
   2. URL (ex: http://telescope.com)
   3. term (ex: astronomy)
   4. facet (ex: subjects)

Given these values, how can I build %profile?



Short answer: $profile{ $facet }{ $term }{ $resource } = $url;



Wow! Perfect!!

I have been able to take what Jonathan Gorman, Bruce Van Allen, and  
Brad Baxter have given me and incorporate it into a the beginnings of  
a patron-specific interface of MyLibrary. In MyLibrary patrons can be  
created and cataloged with facet/term combinations -- a controlled  
vocabulary. These same facet/term combinations are used to catalog  
information resources. Thus, through the controlled vocabulary I am  
able to create relationships between resources and patrons.


The results is the display of a set of information resources designed  
for individuals with particular characteristics. For example, try the  
following URLs. Each points to a different patron with different  
characteristics, and each page provides the ability to display the  
information resources in an alphabetical or grouped view:


  * Andrew Carnegie
http://dewey.library.nd.edu/morgan/portal/?cmd=patronid=194

  * Leonardo D'Vinci
http://dewey.library.nd.edu/morgan/portal/?cmd=patronid=191

  * Galileo Galilei
http://dewey.library.nd.edu/morgan/portal/?cmd=patronid=193


Thanks guys. I have added your names to my code.

--
Eric Morgan



dereferencing an array

2006-02-10 Thread Eric Lease Morgan


How do I loop through a reference to an array?

I have the following data structure:

  my %facets = (
'audiences' = [('freshman', 'senior')],
'subjects'  = [('music', 'history')],
'tools' = [('dictionaries', 'catalogs')]
  );

I can use this code to get the keys for %facets:

  foreach my $key (sort(keys(%facets))) { print $key, \n }

But since $key points to the reference of an array, I don't know how  
to loop through the referenced array.



--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame

(574) 631-8604






Re: dereferencing an array

2006-02-10 Thread Eric Lease Morgan


On Feb 10, 2006, at 3:58 PM, Eric Lease Morgan wrote:

Now I'm going to make each value in the referenced array a  
reference to a hash; I'm going to make my data structure deeper.  
'More later.


Since that worked so well, I'll ask this question. Given the  
following data structure, how do I print out something like this:


  tools
dictionaries
  websters - http://websters.com
  oxford - http://oxford.edu
catalogs
  und - http://catalog.nd.edu
  worldcat - http://worldcat.com


  my %facets = (

'tools' = [(

  'dictionaries' = [(
'websters' = 'http://websters.com',
'oxford' = 'http://oxford.edu'
  )],

  'catalogs' = [(
'und' = 'http://catalog.nd.edu',
'worldcat' = 'http://worldcat.com'
  )]

)]

  );


This code doesn't cut it:

  foreach my $key (sort(keys(%facets))) {

print $key, \n;

foreach my $term (@{$facets{$key}}) {

  print \t, $term, \n;

}

  }


Is my data structure dumb?

--
Eric Morgan



Re: dereferencing an array - Pt 2

2006-02-10 Thread Eric Lease Morgan


On Feb 10, 2006, at 5:41 PM, Bruce Van Allen wrote:


foreach my $facet_key (keys %facets) {
  print $facet_key\n;
  my %sub_hash= %{ $facets{$facet_key} };
  foreach my $sub_key (keys %sub_hash) {
print \t$sub_key\n;
my %inner_hash= %{ $sub_hash{$sub_key} };
foreach my $inner_key (keys %inner_hash) {
  print \t\t$inner_key - $inner_hash{$inner_key}\n;
}
  }
}



This has been VERY helpful, and I appreciate the assistance. Now I  
need to programatically build the hash.


I have this sample data structure:

  my %profile = (
'subjects' = {
  'astronomy' = {
'telescope world' = 'http://telescope.com',
'stars r us' = 'http://websters.com',
'asto magazine' = 'http://oxford.edu'
  },
  'mathematics' = {
'2 + 2 = 4' = 'http://catalog.nd.edu',
'math library' = 'http://worldcat.com'
  }
},
'tools' = {
  'dictionaries' = {
'websters' = 'http://websters.com',
'oxford' = 'http://oxford.edu'
  },
  'catalogs' = {
'und' = 'http://catalog.nd.edu',
'worldcat' = 'http://worldcat.com'
  }
}
  );


I use the followign code, based on the good work of Bruce, to  
traverse %profile and output a set of nested HTML lists. It works for  
any size of %profile. Fun!


  print ul;
  foreach my $facet (sort(keys(%profile))) {
print li$facet;
my %facets = %{$profile{$facet}};
print ul;
foreach my $term (sort(keys(%{$profile{$facet}}))) {
  print li$term;
  my %terms = %{$facets{$term}};
  print ol;
  foreach my $resource (sort(keys(%terms))) {
print lia href='$facets{$term}{$resource}'$resource/ 
a/li;

  }
  print /ol;
  print /li;
}
print /ul;
print /li;
  }
  print /ul;


I now need to build %profile programatically. As I loop through a set  
of information resources I can determine the following values:


  1. resource name (ex: telescope world)
  2. URL (ex: http://telescope.com)
  3. term (ex: astronomy)
  4. facet (ex: subjects)

Given these values, how can I build %profile?

--
Eric Perl Data Structures !R My Forte Morgan






mylibrary portal

2006-01-27 Thread Eric Lease Morgan


We have created an additional MyLibrary end-user interface that will  
be distributed with the Perl modules:


  http://dewey.library.nd.edu/mylibrary/portal/

The interface supports browse by facet and term, as well as search.  
The administrative interface allows you to add facets, terms,  
location types, and resources. It then provides the means to index  
the whole thing and make it accessible via SRU. The back-end can  
export the content of the database and make it available as an OAI  
data repository.


Please bang on the Portal a bit and tell us what you think. Our next  
step will be to fix abnormalities and enhance the whole thing with  
documentation (PODs).


--
Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604






Re: mylibrary tutorial

2006-01-23 Thread Eric Lease Morgan


I have all but finished my MyLibrary Tutorial:

  By the end of the tutorial the reader should be able to:
  create sets of facets, create sets of terms, create sets of
  librarians, create sets of location types, create sets of
  resources, classify librarians and resources with terms,
  work with sets of resources assoicated with particular sets
  of terms, output the resources' titles, descriptions and
  locations, create a freetext index of MyLibrary content,
  harvest OAI repositories and cache the content in a
  MyLibrary database.

  http://dewey.library.nd.edu/morgan/tutorial.txt

We will be including this document in the upcoming MyLibrary Manual.

--
Eric Lease Morgan
University Libraries of Notre Dame



Re: cpan

2006-01-16 Thread Eric Lease Morgan


On Jan 13, 2006, at 2:14 PM, Eric Lease Morgan wrote:

Is is kosher to upload something like the MyLibrary Perl modules to  
CPAN, and if so where would we put it?


Based on feedback I've gotten on and off list as well as from  
postings to the perl.module-authors mailing list/newsgroup, I think I  
will upload MyLibrary to cpan and create a top-level namespace.


--
Eric Lease Morgan



mylibrary manual

2005-05-09 Thread Eric Lease Morgan
I have gone the next step to writing a MyLibrary version 3.0 manual. 
See:

  http://dewey.library.nd.edu/mylibrary/manual/
As it stands right now, the manual covers the vast majority of the Perl 
API, and as soon as I figure out why my FO/PDF processor broke I will 
create a PDF version.

FYI.
--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame
(574) 631-8604


wanted: short-term Perl programmer

2005-03-29 Thread Eric Lease Morgan
This is a want-ad for a short-term Perl programmer. Please share it as 
you see fit.

Short-term Perl programmer
The University Libraries of Notre Dame Libraries is seeking an expert 
Perl programmer to work on a short-term project for a professional 
salary.

Description: The Libraries is involved in a national research and 
development activity. One of the activity's goals is to enhance an 
information retrieval system with a Find More Like This One feature. 
This feature will:

1. Allow users to identify a desirable record from a list of search 
results

2. Select characteristics from the record the user deems significant
3. Return those characteristics back to the system
4. The system will then use things like locally created dictionaries, 
WordNet, and/or other semantic tools to return additional searches to 
be applied against other internal or external indexes

Requirements: The successful candidate must have exceptional skills in 
reading and writing object oriented Perl programs in a Unix/Linux 
environment. The position requires the candidate to be able document 
their code with comments as well as in the form of PODs. The positon 
requires the candidate to be able to work in a collaborative 
environment. Thus, the candidate must posess well-developed 
communication skills.

Highly desireable: Applicants who demonstrate an understanding of 
relational database techniques, XML and Web Services, academia, as well 
as the principles of open source software will be given preference.

Work environment: The University Libraries is located in Notre Dame, IN 
(just outside South Bend) about ninety miles east of Chicago. Because 
of the location, telecommuting is possible, but regular weekly site 
visits are necessary.

Start date: Immmediately
End date: No later than August 31, 2005
Salary: Starting at $24/hour and negotiable depending on 
qualifications, experience, and flexibility

Application: Send cover letters, resumes, and questions to Eric Lease 
Morgan ([EMAIL PROTECTED]). All inquires will be 
acknowledged.

--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame
(574) 631-8604


option items sorted in pop-up menus

2005-03-10 Thread Eric Lease Morgan
Is there any way to use CGI.pm and still have option items sorted in 
pop-up menus?

CGI.pm provides cook ways to create HTML form. To create a pop-up menu, 
I can do something like this:

  # create a hash of terms
  my @terms = MyLibrary::Term-get_terms(sort = 'name');
  my %terms;
  foreach (@terms) { $terms{$_-term_id} = $_-term_name }
  ...
  $html .= $cgi-popup_menu(-name = 'id', -values = \%terms);
where:
  -name is the name of the parameter I want returned
  -values is a reference to a hash containing id and value pairs.
The problem is that -values is a hash, and when CGI.pm displays the 
pop-up menu the items do not come out sorted. (They were inserted into 
the hash in a sorted order.)

Is there some I can get CGI.pm to output the popup items in sorted 
order, or should I write my own little function to do this for me?

--
Eric Lease Morgan
University Libraries of Notre Dame


Re: adding a MARC tag called SYS

2004-08-03 Thread Eric Lease Morgan
On Jul 30, 2004, at 8:56 AM, Eric Lease Morgan wrote:
Besides the fact that the addition of a field named SYS may be a 
feature of my integrated library system, how can I add such a field 
to my data?
Well, now the whole thing is a moot point in my book. Instead of using 
a kewl SYS field in my records, I stuffed my data into 035 subfield 
y.

'Sorry for the fuss.
--
Eric Lease Morgan
University Libraries of Notre Dame


adding a MARC tag called SYS

2004-07-30 Thread Eric Lease Morgan
How can I add a MARC tag called SYS to a set of MARC records?
I want to loop through a set of MARC records, extract the last nine 
characters of the 001 field, add a new field to each record called SYS, 
and output the resulting data to a new file. I have the following code 
snippet that does the work:

# read each record
while (my $record = $batch-next()) {
# get 001 field
my $field = $record-field('001')-as_string;

# create a system number, the last nine characters of $field
my $sysno = substr $field, -9, 9;

# add a sys tag
$record-append_fields(MARC::Field-new('SYS', $sysno));

# write to STDOUT
print $record-as_usmarc();
}
Alas, MARC::Field says I need to include a subfield in the SYS field, 
but that is not what I want. I want the SYS field to contain no 
indicators nor subfields. I want it to be just like normal MARC tags 
with values less than 010.

Besides the fact that the addition of a field named SYS may be a 
feature of my integrated library system, how can I add such a field 
to my data?

--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame
(574) 631-8604


Re: xml::libxslt [resolved]

2004-07-14 Thread Eric Lease Morgan
On Jul 14, 2004, at 11:01 AM, Randy Kobes wrote:
Second, Perl return this in the terminal:
   Can't load 'C:/Perl/site/lib/auto/XML/LibXSLT/LibXSLT.dll'
   for module XML::LibXSLT: load_file:The specified procedure
   could not be found at C:/Perl/lib/DynaLoader.pm line 230.
   at C:\xml\bin\xsltproc.pl line 16
   Compilation failed in require at C:\xml\bin\xsltproc.pl line 16.
   BEGIN failed--compilation aborted at C:\xml\bin\xsltproc.pl line 
16.

Where can I get another copy of XML::LibXSLT for ActiveState 
ActivePerl?
I'm not aware of one ... This problem usually means that
your system has a version of the external dlls needed by the
Perl modules that is incompatible with the dlls the Perl
modules were compiled with. What I would suggest is to
uninstall XML-LibXML, XML-LibXML-Common, and XML-LibXSLT,
and then reinstall them, making sure that the post-install
scripts run by XML-LibXML-Common (to install libxml2.dll)
and XML-LibXSLT (to install libxslt-related dlls) are
successfully run. If the post-install scripts do find a copy
of the needed dlls on your system, have it fetch and install
them anyway, just to make sure the correct version of the
dlls are available. And adjust your PATH environment
variable to make sure the directory these dlls live in is
searched before other directories that may contain other
versions of the dlls.
Thank you. This is what I needed to know. After I got rid of the 
extraneous *.dll files (specifically libxml2.dll installed by Swish-e) 
I was able to run scripts written with XML::LibXML and XML::LibXSLT 
successfully. What's more, my swish-e programs still work as desires. 
Whew!

Thank you, and the open source software + mailing list combination 
comes through yet again.

--
Eric Lease Morgan
University Libraries of Notre Dame
(574) 631-8604


xml::libxslt on windows

2004-07-12 Thread Eric Lease Morgan
Have you gotten XML::LibXSLT to work on Windows, and if so, then how?
I have installed ActiveState Perl version 5.8.4 build 810 on my Windows 
computer. It resides in c:\Perl.   I configured ppm to point to an 
additional repository at http://theoryx5.uwinnipeg.ca/ppms/ where I 
found XML::LibXML ppm files.

I got XML::LibXML to work just fine, but XML::LibXSLT fails like this:
  Can't load 'C:/Perl/site/lib/auto/XML/LibXSLT/LibXSLT.dll' for
  module XML::LibXSLT: load_file:The specified procedure could
  not be found at C:/Perl/lib/DynaLoader.pm line 230. at
  C:\xml\bin\xsltproc.pl line 16
  Compilation failed in require at C:\xml\bin\xsltproc.pl line 16.
  BEGIN failed--compilation aborted at C:\xml\bin\xsltproc.pl line 16.
Windows also spits this out in a dialog box:
  The procedure entry point xmlDictCreateSub could not be
  located in the dynamci link library libxml2.dll
Do y'all have any hints on how I can resolve this problem?
--
Eric Lease Morgan
University Libraries of Notre Dame
(574) 631-8604


RE: using xml::libxml to find replace in xml documents

2004-07-10 Thread Eric Lease Morgan
I wrote:
Has anybody here written one or more Perl scripts using XML::LibXML to
find  replace in XML documents?...
Thank you the all the replies. One person recommending a Perl module 
called XML::Twig. Another person recommended I use regular expressions. 
Two people recommended the use of XSLT. One of these provided sample 
code. The other wrote a full-blown program! (Thanks, Andrew 
Houghton!)

In the end I re-read some of my Perl/XML books and decided to write a 
SAX filter using XML::SAX::ParserFactory. Such a filter has the 
following shape:

  use strict;
  use XML::SAX::ParserFactory;
  my $handler = MyHandler-new();
  my $parser = XML::SAX::ParserFactory-parser(Handler = $handler);
  $parser-parse_uri($ARGV[0]);
  exit;
  package MyHandler;
  sub new {
  my $type = shift;
  return bless {}, $type;
  }
  sub start_element {
  my ($self, $element) = @_;
  print Starting element $element-{Name}\n;
  }
  sub end_element {
  my ($self, $element) = @_;
  print Ending element $element-{Name}\n;
  }
  sub characters {
  my ($self, $characters) = @_;
  print characters: $characters-{Data}\n;
  }
  1;
I have saved my script at the following location:
  http://infomotions.com/musings/getting-started/fix-ead.txt
The script will eventually be a part of a workshop I am giving called 
Shining a LAMP on XML. the outline, to date, is here:

  http://infomotions.com/musings/getting-started/LAMP.txt
'More later.
--
Eric Lease Morgan


using xml::libxml to find replace in xml documents

2004-07-08 Thread Eric Lease Morgan
Has anybody here written one or more Perl scripts using XML::LibXML to 
find  replace in XML documents?

I have a set of 700 XML files. Each one has an incorrect attribute 
value in a processing instruction, a few invalid attributes in a 
particular element, and a set of elements that are no longer valid 
against the DTD.

I want to use XML::LibXML to clean up these files, and I'm hope someone 
out there has already done this to some extent and can share their 
code. While the XML::LibXML modules are very functional, I wish they 
had more examples in their PODs.

--
Eric Lease Morgan
University Libraries of Notre Dame


berkeleydb xml

2004-05-25 Thread Eric Lease Morgan
Does anybody here have any experience with BerkeleyDB XML?
I have finally gotten it to compile and installed, and I'm intrigued 
with the idea of native XML database, especially combined with the use 
of Perl and its included Perl API. For more information see:

  http://www.sleepycat.com/xmldocs/ref_xml/toc.html
--
Eric Lease Morgan
University Libraries of Notre Dame


STDIN as well as command line input

2004-04-26 Thread Eric Lease Morgan
How do I get a Perl program to accept input from STDIN as well as 
command line input.

I have a program (foo.pl) that is designed to read the contents of 
@ARGV and process each item in the array. Tastes great. Less filling. 
So, when I do something like this, things work just fine:

  %foo.pl a b c d e f g

I have another program (bar.pl) that prints to STDOUT. The output is 
the same sort of data needed by foo.pl. So, I thought I'd give this is 
a whirl:

  %bar.pl | foo.pl

But alas, foo.pl never seems to get the input sent from bar.pl. It does 
not seem to read from STDIN.

What should I do to my first program (foo.pl) so it can accept command 
line input as well as input from STDIN?

--
Eric Lease Morgan
(574) 631-8604


Re: STDIN as well as command line input

2004-04-26 Thread Eric Lease Morgan
On Apr 26, 2004, at 10:43 AM, Andy Lester wrote:

How are you reading from the files?  Opening them yourself one at a 
time?
Don't.  Use the magic filehandle.
On Apr 26, 2004, at 10:44 AM, Dennis Boone wrote:

If your perl script is structured like this:

while ()
{
# process
}
then perl will process stdin if no files are named, or the contents
of each file named on the command line in sequence.


Alas, my inputs are not the names of files. They are scalars, like this:

  plato-cratylus-1072532262 plato-charmides-1072462708 
bacon-new-1072751992

--
Eric
(574) 631-8604



Re: STDIN as well as command line input

2004-04-26 Thread Eric Lease Morgan
On Apr 26, 2004, at 10:53 AM, Michael McDonnell wrote:

This sort of situation can be dealt with with back ticks:

foo.pl `bar.pl`

This is nice in that you can probably do this too:

foo.pl a b c `bar.pl` d e f g h `bar.pl x y z` i j k

A popular GNUism might be helpful here as well.  Many GNU programs use 
an option command line argument of -- to indicate that input should 
be taken from STDIN instead of from other command line arguments.
The back ticks solutions works well. Thank you. I will see about 
modifying my code to get smart about -- arguments. Again, thanks.

--
Eric
(574) 631-8604



Re: automagically create browsable POD pages [pod2html]

2004-04-02 Thread Eric Lease Morgan
On Apr 1, 2004, at 10:33 AM, Eric Lease Morgan wrote:

Is there some sort of make command I can run that will read the PODs 
in my distribution, turn them into (X)HTML files, and save them in a 
specified local directory of my distribution's filesystem?
Thank you for the prompt replies, but the suggestions are overkill. I 
simply want to:

  1. create a doc directory
  2. loop through my lib directory looking for pods
  3. convert each pod to xhtml
  4. save converted files to the pod directory
I think I will write a local wrapper to pod2html.

BTW, pod2html looks like it will already do this, but I can't figure 
out how to make it:

  1. create a single xhtml file for each pod
  2. give each file a specific name
Yeah, I can't do this pod by pod, but I'm lazy.

--
Eric Morgan


Re: automagically create browsable POD pages [pod2html]

2004-04-02 Thread Eric Lease Morgan
On Apr 2, 2004, at 6:55 AM, Eric Lease Morgan wrote:

Thank you for the prompt replies, but the suggestions are overkill. I 
simply want to:

  1. create a doc directory
  2. loop through my lib directory looking for pods
  3. convert each pod to xhtml
  4. save converted files to the pod directory
Like this, but there has got to be a better way:

#!/usr/bin/perl

use File::Basename;
use File::Find;
my $POD2HTML = 'pod2html';

my $IN  = $ARGV[0];
my $OUT = $ARGV[1];
find (\process_files, $IN);
exit;
sub process_files {

  # get the name of the found file
  my $file = $File::Find::name;
  # make it has the correct extension
  next if ($file !~ m/\.pm$/);
  # extract the necessary parts of the file name
  (my $name, my $path, my $suffix) = File::Basename::fileparse($file, 
'\..*');

  my $cmd = $POD2HTML .  --outfile=$OUT/$name.html --title=$name 
$file;
  print $cmd\n;
  system $cmd;
	
}

--
Eric Lease Morgan


automagically create browsable POD pages

2004-04-01 Thread Eric Lease Morgan
Is there some sort of incantation I can send to a Perl-generated 
Makefile in order to automagically create browsable POD pages?

Here at MyLibrary Central we have been re-writing MyLibrary. We are 
using the following technique:

  1. Write POD.
  2. Write tests.
  3. Write module.
  4. Go to Step #1 until tests pass and module is complete.
  5. Write scripts using module.
The process works great, and that is an understatement. Through this 
process we have created bunches o' PODs, and we want to share them with 
the world. Release early. Release often.

Is there some sort of make command I can run that will read the PODs in 
my distribution, turn them into (X)HTML files, and save them in a 
specified local directory of my distribution's filesystem?

--
Eric Morgan
University Libraries of Notre Dame
(574) 631-8604



net::z3950

2004-02-19 Thread Eric Lease Morgan
If I have installed ActiveState Perl on my Windows computer, then how 
do I install net::z3950?

When I run ppm and then type 'install net-z3950' I get a message saying 
that 45 bytes were downloaded, things were successful, but there are no 
new modules in my path.

Has anybody here installed Net::Z3950 on Windows?

--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame
(574) 631-8604



Really Rudimentary Catalog

2004-02-15 Thread Eric Lease Morgan
On 1/5/04 11:34 AM, Eric Lease Morgan [EMAIL PROTECTED] wrote:

 My book catalog excels at inventorying my collection. It does a very poor job
 at recommending/suggesting what book(s) to use. The solution is not with more
 powerful search features, nor is it with bibliographic instruction. The
 solution is lies in better, more robust data, as well as access to the full
 text. This is not just a problem with my catalog. It is a problem with online
 public access catalogs everywhere, but I deviate. I'm off topic. All of this
 is fodder for my book catalog's About text.

I have packaged up my implementation of book catalog (Really Rudimentary
Catalog), made the Perl source code available, and re-articulated my ideas
about the limitations of traditional library catalogs here in the system's
About text:

  http://infomotions.com/books/?cmd=about

-- 
Eric Lease Morgan
University Libraries of Notre Dame



Re: Net::Z3950 and diacritics [book catalogs]

2004-01-05 Thread Eric Lease Morgan
On 12/16/03 8:57 AM, Eric Lease Morgan [EMAIL PROTECTED] wrote:

 Upon further investigation, it seems that MARC::Batch is not necessarily
 causing my problem with diacritics, instead, the problem may lie in the way I
 am downloading my records using Net::Z3950

Thank you to everybody who replied to my messages about MARC data and
Net::Z3950.

I must admit, I still don't understand all the issues. It seems there are at
least a couple of character sets that can be used to encode MARC data. The
characters in these sets are not always 1 byte long (specifically the
characters with diacritics), and consequently the leader of my downloaded
MARC records was not always accurate, I think. Again, I still don't
understand all the issues, and the discrepancy is most likely entirely my
fault.

I consider my personal catalog about 80% complete. I have about another 200
books to copy catalog, and I can see a few more enhancements to my
application, but they will not significantly increase the system's
functionality. I consider those enhancements to be featuritis. Using my
Web browser I can catalog about two books per minute.

In any event, the number of book descriptions from my personal catalog
containing diacritics is very small. Tiny. Consequently, my solution was to
either hack my MARC records to remove the diacritic or skip the inclusion of
the record all together.

The process of creating my personal catalog was very enlightening. The MARC
records in my catalog are very very similar to the records found in catalogs
across the world. My catalog provides author, title, and subject searching.
It provides Boolean logic, nested queries, and right-hand truncation. The
entire record is free-text searchable. Everything is accessible. The results
can be sorted by author, title, subject, and rank (statistical relevance). A
cool search is a search for cookery:

  http://infomotions.com/books/?cmd=searchquery=cookery

Yet, I still find the catalog lacking, and what it is lacking is/are three
things: 1) more descriptive summaries like abstracts, 2) qualitative
judgments like reviews and/or the number of uses (popularity), and 3) access
to the full text. These are problems I hope to address in my developing
third iteration of my Alex Catalogue:

  http://infomotions.com/alex2/

My book catalog excels at inventorying my collection. It does a very poor
job at recommending/suggesting what book(s) to use. The solution is not with
more powerful search features, nor is it with bibliographic instruction. The
solution is lies in better, more robust data, as well as access to the full
text. This is not just a problem with my catalog. It is a problem with
online public access catalogs everywhere, but I deviate. I'm off topic. All
of this is fodder for my book catalog's About text.

Again, thank you for the input.

-- 
Eric Lease Morgan
University Libraries of Notre Dame



Re: Extracting data from an XML file

2004-01-05 Thread Eric Lease Morgan
I wrote:

 Can you suggest a fast, efficient way to use Perl to extract selected
 data from an XML file?...

First of all, thank you everyone who promptly replied to my query.

Second, I was not quite clear in my question. Many people said I should
write an XSLT style sheet to transform my XML document into HTML. This is in
fact what I do, but I was not clear in my question. I need a process to not
only transform each of my documents, but I also need to create an author as
well as title indexes to my collection, and therefore I need to extract bits
of data from each of my original XML files.

Third, most of the replies fell into two categories: 1) use an XSLT style
sheet as as sort of subroutine, and 2) use XML::Twig.

Fourth, I tried both of these approaches plus my own, and timed them. I had
to process 1.5 MB of data in nineteen files. Tiny. Ironically, my original
code was the fastest at 96 seconds. The XSLT implementation came in second
at 101 seconds, and the XML::Twig implementation, while straight-forward
came in last as 141 seconds. (See the attached code snippets.)

Since my original implementation is still the fastest, and the newer
implementations do not improve the speed of the application, then I must
assume that the process is slow because of the XSLT transformations
themselves. These transformations are straight-forward:

  # transform the document and save it
  my $doc   = $parser-parse_file($file);
  my $results   = $stylesheet-transform($doc);
  my $html_file = $HTML_DIR/$id.html;
  open OUT,  $html_file;
  print OUT $stylesheet-output_string($results);
  close OUT;
  
  # convert the HTML to plain text and save it
  my $html  = parse_htmlfile($html_file);
  my $text_file = $TEXT_DIR/$id.txt;
  open OUT,  $text_file;
  print OUT $formatter-format($html);
  close OUT;

When my collection grows big I will have to figure out a better way to batch
transform my documents. I might even have to break down and write a shell
script to call xsltproc directly. (Blasphemy!)

-- 
Eric Lease Morgan
University Libraries of Notre Dame




subroutines.txt
Description: application/applefile
  # my original code
  print Processing $file...\n;
  my $doc= $parser-parse_file($file);
  my $root   = $doc-getDocumentElement;
  my @header = $root-findnodes('teiHeader');
  my $author = $header[0]-findvalue('fileDesc/titleStmt/author');
  my $title  = $header[0]-findvalue('fileDesc/titleStmt/title');
  my $id = $header[0]-findvalue('fileDesc/publicationStmt/idno');
  print   author: $author\n   title: $title\n  id: $id\n\n;


  # using an XSLT stylesheet
  print Processing $file...\n;
  my $style  = $parser-parse_file($AUTIID);
  my $stylesheet = $xslt-parse_stylesheet($style);
  my $doc= $parser-parse_file($file);
  my $results= $stylesheet-transform($doc);
  my $fullResult = ($stylesheet-output_string($results));
  my @fullResult = split /#/, $fullResult;
  my $title  = $fullResult[0];
  my $author = $fullResult[1];
  my $id = $fullResult[2];
  print   author: $author\n   title: $title\n  id: $id\n\n;


  # using XML::Twig
  print Processing $file...\n;
  my ($author, $title, $id);
  my $twig = new XML::Twig(TwigHandlers = {
'teiHeader/fileDesc/titleStmt/author' = sub {$author = $_[1]-text},
'teiHeader/fileDesc/titleStmt/title'  = sub {$title  = $_[1]-text},
'teiHeader/fileDesc/publicationStmt/idno' = sub {$id = $_[1]-text}});
  $twig-parsefile($file);
  print   author: $author\n   title: $title\n  id: $id\n\n;


Re: constructing a Z39.50 search

2003-12-08 Thread Eric Lease Morgan
On 12/7/03 10:04 PM, Eric Lease Morgan [EMAIL PROTECTED] wrote:

 Can you tell me how to construct a particular Z39.50 query? Specifically, how
 do I create a Library of Congress card number search or a MARC tag 001 search?

[EMAIL PROTECTED] wrote:

In order to construct a Z39.50 query for LC control number (e.g.,
2002012345), transmit the Use attribute value 9.  This will map to
appropriate internal searches.  We support either a keyword or a
left-anchored LCCN search.  Therefore, your intersite query will be
supported if it contains only the Use attribute (we'll default to a keyword
search), or you could also include additional attributes.  Several examples
follow:

 Use Relation  Position  Structure  Truncation  Completeness
 --      -  --  
  9none none   none   none none
  9 3 3 2  100   1
  9 3 1 11   1
  9none   3none   none none
  9none   1none   none none


With regard to the MARC 001 search (i.e., local control number -- Voyager
record ID), our server supports this if you transmit Use attribute value
12.  *HOWEVER*, there is a bug in Voyager servers that you need to know
about.  If your application maintains a connection (i.e., is not stateless),
Use attribute 12 will break subsequent searches in the session.  The
record requested by the local control number search (e.g., 123456) will be
returned correctly, but all subsequent searches in the session will return
the same record.  I'd recommend using the LCCN search instead.


Larry E. DixsonInternet:[EMAIL PROTECTED]
Network Development and MARC
   Standards Office, LM639
Library of CongressTelephone: (202) 707-5807
Washington, D.C.  20540-4402   Fax:   (202) 707-0115


Larry's answer was perfect. All I have do to is hack my script like this for
LC Card Number search:

  # initialize a z39.50 query
  my $query = [EMAIL PROTECTED] 1=9  . pop();
  ^^^

Or this for a Voyager control number (key) search:

  # initialize a z39.50 query
  my $query = [EMAIL PROTECTED] 1=12  . pop();
  

To make this happen I will add a couple of switches to the command-line
input, and I'm set.

Cool! These new fangled Internet and mailing list things work great!!

-- 
Eric Lease Morgan
University Libraries of Notre Dame

P.S. To whomever passed my original query along to Larry, I say, Thank
you.




constructing a Z39.50 search

2003-12-07 Thread Eric Lease Morgan

Can you tell me how to construct a particular Z39.50 query? Specifically,
how do I create a Library of Congress card number search or a MARC tag 001
search?

I am writing a simple online public access catalog. For a good time try:

  http://infomotions.com/opac/?cmd=searchquery=civil*
  http://infomotions.com/opac/?cmd=searchquery=perl

To build my catalog I have written a command-line driven, brain-dead
acquisition application based on the the MARC::Record tutorials. The
acquisition application takes ISBN numbers as input, searches them against
the Library of Congress's database, returns MARC records, and saves them
accordingly. My application contains this snippet of code:

  # check for command line input, a string of ISBN numbers
  exit if ($#ARGV  0);
  
  # initialize a z39.50 query
  my $query = [EMAIL PROTECTED] 1=7  . pop();
  
  # populate the query with ISBN numbers
  while (@ARGV) { $query = '@or @attr 1=7 ' . pop() .  $query }

This process works just fine, but every once in a while the queries fail
because more than one record matches an ISBN number. Consequently, I need to
be more specific. Maybe I could build a MARC tag 001 search. What does the
001 field contain? Is it normally indexed? If so, how can I create a $query
to specify a search against this field?

At the same time ISBN numbers sometimes fail because they are not found in
the data at all. In these cases I want to create a LC card number search.
Can somebody tell me how to create $query to specify a card search?

-- 
Eric Lease Morgan
University Libraries of Notre Dame






Re: test for an array

2003-09-30 Thread Eric Lease Morgan

Thank you for the prompt replies.

On 9/30/03 5:18 PM, Andy Lester [EMAIL PROTECTED] wrote:

 Rather than thinking of it in terms of what the function should
 return, think of it as what you're expecting.  For example, you might:
  
 
 my @vars = $librarian-term_ids( ... );
 is( scalar @vars, 3, Got back three items );
 is( $vars[0], Smith, Checking name );
 is( $vars[1], HR128, Checking homeroom );
 ok( !$vars[2], Checking that Smith is NOT the principal );

I can implement this, and think of what you're expecting makes a lot of
sense. Thank you.

-- 
Eric Morgan