Re: [CODE4LIB] Good advanced search screens

2008-11-14 Thread Mark Jordan
Hi David,

You might want to consider an advanced search interface that offers a varying 
number of options. We've done this to a certain extent in the PKP Metadata 
Harvester for schemas more complex than Dublin Core. An example of a harvester 
that has some MODS in it is at 
http://harvesters.sfu.ca/chodarr/index.php/search, if you want to see how we 
implemented this (click on the "More fields" button).

We're currently rewriting the Harvester so I'd be interested in hearing what 
you settle on. That particular application suffers from the same problem you're 
describing with WorldCat -- a very rich metadata set to search against, plus in 
the Harvester's case, new schemas can be added fairly easily, and we don't want 
admins to have to rewrite the search form when they add a new schema.

Mark

Mark Jordan
Head of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Voice: 778.782.5753 / Fax: 778.782.3023
[EMAIL PROTECTED]

- "David Walker" <[EMAIL PROTECTED]> wrote:

> I'm working on an advanced search screen as part of our WorldCat API
> project.
> 
> WorldCat has dozens of indexes and a ton of limiters.  So many, in
> fact, that it's rather daunting trying to design it all in a way that
> isn't just a big dump of fields and check boxes that only a cataloger
> could decipher.
> 
> So I'm looking for examples of good advanced search screens (for
> bibliographic databases or otherwise) to gain some inspiration. 
> Thanks!
> 
> --Dave
> 
> ==
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu


[CODE4LIB] Good advanced search screens

2008-11-14 Thread Walker, David
I'm working on an advanced search screen as part of our WorldCat API project.

WorldCat has dozens of indexes and a ton of limiters.  So many, in fact, that 
it's rather daunting trying to design it all in a way that isn't just a big 
dump of fields and check boxes that only a cataloger could decipher.

So I'm looking for examples of good advanced search screens (for bibliographic 
databases or otherwise) to gain some inspiration.  Thanks!

--Dave

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu


Re: [CODE4LIB] Reference string parsing software available: ParsCit v080402

2008-11-14 Thread MJ Suhonos

Hi all,

John, the supplemented approach you describe is how we go about it in  
our Lemon8-XML (L8X) software (http://pkp.sfu.ca/lemon8); The way L8X  
handles parsing is it passes the original unparsed string to a number  
of different parsers in turn (Freecite, each of the 3 Paracite  
parsers, and a home-grown regex parser), does a little cleaning and  
normalization, and then hands the results to the user to select the  
correct values for each element.


Most of the time, it actually does a pretty good job of detecting the  
right elements -- in fact, numeric stuff like volume, issue, pages,  
etc. tend to be more accurate than names and titles, mostly because of  
the larger variance in the latter.  Our experience has been that  
relying on a single approach (machine-learning vs. format-rule-based  
vs. regular-expression) is less reliable than getting partial matches  
from various approaches, and then assembling them.  In this case, the  
whole is in fact greater than the sum of the parts.


I haven't added the ParsCit web service explicitly since a SOAP-based  
interface is a bit more cumbersome in PHP than FreeCite's POST-type  
interface, but I'll make a point of doing so now.  Incrementally  
adding services that all map to the same citation elements (we use the  
OpenURL 1.0 fields, with a few aberrations) means it's very easy to  
increase the accuracy by simply adding another parsing plugin/service.


You'd have to pull out the relevant classes from L8X to get a  
standalone parser, but since this is one of the more appealing aspects  
of the software for many people, we're looking at making a simple API  
in L8X to just do the citation parsing, possibly without the UI to  
take it from semi-automated to completely automatic.


MJ

On 14-Nov-08, at 12:07 AM, Jonathan Rochkind wrote:

Thanks Min, this is a great project, that I keep trying to find time  
to investigate more. Don't apologize for keeping us updated, please  
continue to!


Do you know if any of the improvements have improved detection of  
volume/issue/page# information? For what I want to use it for,  
reasonably accurate parsing of volume/issue/page# is needed, and so  
far whenever I've looked at demos, this seems to be something that  
all of these machine-learning-type approaches do pretty awfully at.  
(I wonder if you are not including this in your training much,  
because it isn't neccesary for your purposes to have volume/issue/ 
page#?)


I also have wondered if it would make sense to take a machine- 
learning-type approach to begin with, but then supplement it with  
formal-rule-based parsing to attempt to get vol/issue/page#  
according to common patterns?


I don't have too much time to try work on this myself, but if anyone  
who is working on these various citation parsing efforts could  
improve volume/issue/page# to a reasonable level, it would make the  
libraries useful for a much greater range of applications.


Jonathan



Min-Yen Kan <[EMAIL PROTECTED]> 11/13/08 8:30 PM >>>

Dear all:

(Sorry to resurrect an old thread...)

We've seen the release of several new freely available reference
string parsers in recent months.
The ParsCit team has also been updating the ParsCit package, and is
happy to announce a new version that improves on classification
accuracy, and adds training data in Italian, German and French and for
a different discipline of humanities. We've updated the classification
model to reflect these changes, which should be as easy to use as the
original ParsCit.

You can either download a copy of ParsCit for your own use, or use it
through a web services interface. We welcome your feedback and hope
that if you use ParsCit or any other freely available reference string
parsing tool that you can contribute annotated data to help make these
models more robust.

ParsCit is available from: http://wing.comp.nus.edu.sg/parsCit/
Current Distribution: http://wing.comp.nus.edu.sg/parsCit/parscit-080917.zip

and is a joint collaboration between Pennsylvania State University
(the folks who brought you CiteSeerX) as well as the National
University of Singapore.

Cheers,

Min

P.S. Integration with other freely available parsing systems is
hopefully in the works too. If you have something to contribute, we'll
be happy to commit some bandwidth into getting it integrated with
ParsCit.


Re: [CODE4LIB] djatoka

2008-11-14 Thread Jon Stroop
Another possibility, if no one steps up with a presentation, would be a 
'hacking djatoka' [pre\-un]*conference activity.

-Jon

Jon Stroop
Metadata Analyst
C-17-D2 Firestone Library
Princeton University
Princeton, NJ 08544

Email: [EMAIL PROTECTED]
Phone: (609)258-0059
Fax: (609)258-0441

http://diglib.princeton.edu
http://diglib.princeton.edu/ead



Kevin S. Clarke wrote:

On Fri, Nov 14, 2008 at 6:10 AM, Birkin James Diana
<[EMAIL PROTECTED]> wrote:

  

If any of you have had the good fortune to experiment with it or implement
it into some workflow, get over to the code4libcon09 presentation-proposal
page pronto! And if you're as jazzed about it as I am, and know it'll be as
big in our community as I think it will, consider a pre-conf proposal, too.



Birkin++

I'm very interested in this, too, and hope someone who has had a
chance to play with djatoka will volunteer to do a pres (or better a
pre-conf!)  You'd definitely get my vote.

Kevin


  


Re: [CODE4LIB] djatoka

2008-11-14 Thread Kevin S. Clarke
On Fri, Nov 14, 2008 at 6:10 AM, Birkin James Diana
<[EMAIL PROTECTED]> wrote:

> If any of you have had the good fortune to experiment with it or implement
> it into some workflow, get over to the code4libcon09 presentation-proposal
> page pronto! And if you're as jazzed about it as I am, and know it'll be as
> big in our community as I think it will, consider a pre-conf proposal, too.

Birkin++

I'm very interested in this, too, and hope someone who has had a
chance to play with djatoka will volunteer to do a pres (or better a
pre-conf!)  You'd definitely get my vote.

Kevin


-- 
There are two kinds of people in the world: those who believe there
are two kinds of people and those who know better.


Re: [CODE4LIB] djatoka

2008-11-14 Thread Birkin James Diana

On Nov 14, 2008, at 6:38 AM, John Fereira wrote:


...I've already got a session proposal submitted for Code4Lib...


My take on this is that while I'd like to have as wide a range of  
presenters as possible, a higher priority is a wide range of  
interesting presentations. Since many of us are working on multiple  
interesting things, I'd encourage anyone to consider a second  
presentation-proposal. There are lots of slots, the presentations are  
relatively brief, and I don't see any risk of insularity because the  
community as a whole chooses from among the proposals.


This looks very cool and considering that I've already been using  
aDORe as a repository for storing a large (very large) number of  
scanned images (in JPEG2000 format)...



I had never heard of aDORe before Ryan's djatoka talk. If only there  
were some event coming up where I could hear someone give a brief  
presentation on it.




:)

---
Birkin James Diana
Programmer, Integrated Technology Services
Brown University Library
[EMAIL PROTECTED]


Re: [CODE4LIB] Open Library Environment (OLE) Project - Regional Design Workshops

2008-11-14 Thread Anna Headley
Chin up, Jonathan.  I am hoping to attend one of these precisely because 
I want to talk about how I /wish /my workflow looked, and dream about 
tools that would make my life easier - "ideas on what this type of core 
system *should* incorporate."  I can't be the only one.


Anna H.


Jonathan Rochkind wrote:
I have to admit that I worry that too many of our libraries business processes as currently practiced are completely irrational and nonsensical, and that to model new requirements or systems off of them all aggregated and averaged out... may not be optimal. 

Certainly, you have to collect evidence about business process needs somehow. 

But how many of us have experienced library workflow that actually makes sense, instead of being habits built over years of having to do weird workarounds to work with systems that unreasonably constrained us, built on top of each other layer upon layer, combined in organizations siloed off so the right hand doesn't know what the left is doing, sprinkle on top the natural inclination of most people to be creatures of habit who don't like changing their workflow unless forced---with the result that I'm not even sure we know what makes sense anymore. 


Jonathan


  

Tim McGeary <[EMAIL PROTECTED]> 11/13/08 3:43 PM >>>


John Fereira wrote:
  

Tim McGeary wrote:

The Open Library Environment (OLE, pronounced oh-lay) Project invites 
you to apply to participate in a two day Regional Design Workshop. The 
purpose of this workshop is to provide a forum for representatives of 
local research libraries and related institutions to discuss our work 
surrounding the current Integrated Library System and ideas on what 
this type of core system should incorporate.  Workshops are being held 
in a variety of locations in the US over the next 2 months. For more 
information and to find a location near you, go to:

http://oleproject.org/workshops.
  
That's quite a collection of workshops schedule.  I've been interested 
in the project since John Little first mentioned it here.  On behalf of 
the Spring 2008 JA-SIG conference committee I invited him (and he 
accepted) to do a birds of a feather session at the conference.  There 
are some things that I am working on that I think may fit well with the 
project (I was also a developer for a piece of Kuali Rice, so I know 
some of the Indiana folks) but I can't really tell from the number of 
workshops how the will inter-relate.  Since there were a few dates where 
there are simultaneous workshops in different cities it would seem to me 
that some sort of video conference and a real time collaborative system 
(we used Macromedia Breeze for the Kuali project with developers at 
Cornell and Indiana) would be useful.


With the current economy I know that travel budgets are undergoing a lot 
of scrutiny (I've even heard of a very large university system out west 
that may be halting all business travel for awhile) attending even one 
of the workshops may be problematic.



John,

I hear you about the travel elements of this.  That is why this process 
will not just be closed off to these workshops.  We are hoping to have 
enough workshops to gather a wide range of business processes that we 
can sift through to find commonalities to model the core business 
practices.  On top of that, we will model the differences so that 
flexibility can be built into the OLE architecture.


There will be plenty of time and opportunities for public comment on the 
data gathered at the workshops and the models as they are completed 
before the architecture stage is complete.  So if you, or anyone, cannot 
attend a workshop, there will still be opportunity for comment, and we 
want and need it!


Thank you for your interest - and please encourage others who show 
interest to participate in any way that they can.


Cheers,
Tim
--
Tim McGeary
Senior Systems Specialist
Lehigh University
610-758-4998
[EMAIL PROTECTED]
Google Talk: timmcgeary
Yahoo IM: timmcgeary
  


--
Anna Headley
Swarthmore College Library
610.690.5781
[EMAIL PROTECTED] 


Re: [CODE4LIB] djatoka

2008-11-14 Thread John Fereira

Birkin James Diana wrote:
Yesterday I attended a session of the DLF Fall Forum at which Ryan Chute 
presented on djatoka, the open-source jpeg2008 image-server he and 
Herbert Van de Sompel just released.


It's very cool and near the top of my crowded list of things to play with.

If any of you have had the good fortune to experiment with it or 
implement it into some workflow, get over to the code4libcon09 
presentation-proposal page pronto! And if you're as jazzed about it as I 
am, and know it'll be as big in our community as I think it will, 
consider a pre-conf proposal, too.


-Birkin





This looks very cool and considering that I've already been using aDORe 
as a repository for storing a large (very large) number of scanned 
images (in JPEG2000 format) it's probably something that I should look at.


I've already got a session proposal submitted for Code4Lib but I doubt 
I'll be able to do anything for a pre-conference since I also hope to be 
attending the JA-SIG conference the week after (I'm on the planning 
committee...again) and with travel budgets getting cut right and left 
and/or under extreme scrutiny I'm going to have a difficult time as it 
is justifying attending both conferences.  After Roy's recent posting 
about the conference location I'm even more hyped about going to 
Code4Lib but as we're starting to build up the program from the JA-SIG 
conference I'm getting more hyped about that as well.


[CODE4LIB] djatoka

2008-11-14 Thread Birkin James Diana
Yesterday I attended a session of the DLF Fall Forum at which Ryan  
Chute presented on djatoka, the open-source jpeg2008 image-server he  
and Herbert Van de Sompel just released.


It's very cool and near the top of my crowded list of things to play  
with.


If any of you have had the good fortune to experiment with it or  
implement it into some workflow, get over to the code4libcon09  
presentation-proposal page pronto! And if you're as jazzed about it as  
I am, and know it'll be as big in our community as I think it will,  
consider a pre-conf proposal, too.


-Birkin




---
Birkin James Diana
Programmer, Integrated Technology Services
Brown University Library
[EMAIL PROTECTED]


On Nov 13, 2008, at 9:54 AM, jean rainwater wrote:


Do you have an idea AND are you willing to organize a pre-conference
for Code4Lib 2009?

If so, please send your proposal to code4libcon at http://googlegroups.com/ 
.

Please include 1) a description of the pre-conference, 2)  whether a
full or half day time slot is needed, and 3) max number of
participants.