Nice work Eric. I would like to spend more time playing with it, but I saw a few things I really liked. When a specific query turns up no results you prompt the client to preform a free form search. Less sauvy search users will benefit from this strategy. I also like the display of information when you select a result. Everything is at your finger tips without clutter.
I did get this error when a name search failed to turn up results and I clicked 'help' in the free form search row (the second row). Here is my browser info: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Below are the details from the error: Page 'help-freeform.html' not found in application namespace. Stack Trace: a.. org.apache.tapestry.resolver.PageSpecificationResolver.resolve(PageSpecifica tionResolver.java:120) b.. org.apache.tapestry.pageload.PageSource.getPage(PageSource.java:144) c.. org.apache.tapestry.engine.RequestCycle.getPage(RequestCycle.java:195) d.. org.apache.tapestry.engine.PageService.service(PageService.java:73) e.. org.apache.tapestry.engine.AbstractEngine.service(AbstractEngine.java:872) f.. org.apache.tapestry.ApplicationServlet.doService(ApplicationServlet.java:197 ) g.. org.apache.tapestry.ApplicationServlet.doGet(ApplicationServlet.java:158) h.. javax.servlet.http.HttpServlet.service(HttpServlet.java:740) i.. javax.servlet.http.HttpServlet.service(HttpServlet.java:853) j.. org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:247) k.. org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:193) l.. org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:256) m.. org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) n.. org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) o.. org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) p.. org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:191) q.. org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) r.. org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) s.. org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) t.. org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2422) u.. org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180 ) v.. org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) w.. org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve. java:171) x.. org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:641) y.. org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:163 ) z.. org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:641) aa.. org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) ab.. org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) ac.. org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :174) ad.. org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) ae.. org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) af.. org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) ag.. org.apache.ajp.tomcat4.Ajp13Processor.process(Ajp13Processor.java:457) ah.. org.apache.ajp.tomcat4.Ajp13Processor.run(Ajp13Processor.java:576) ai.. java.lang.Thread.run(Thread.java:534) Luke ----- Original Message ----- From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene User" <lucene-user@jakarta.apache.org> Sent: Friday, February 18, 2005 2:46 PM Subject: Lucene in the Humanties > It's about time I actually did something real with Lucene.... :) > > I have been working with the Applied Research in Patacriticism group at > the University of Virginia for a few months and finally ready to > present what I've been doing. The primary focus of my group is working > with the Rossetti Archive - poems, artwork, interpretations, > collections, and so on of Dante Gabriel Rossetti. I was initially > brought on to build a collection and exhibit system, though I got > detoured a bit as I got involved in applying Lucene to the archive to > replace their existing search system. The existing system used an old > version of Tamino with XPath queries. Tamino is not at fault here, at > least not entirely, because our data is in a very complicated set of > XML files with a lot of non-normalized and legacy metadata - getting at > things via XPath is challenging and practically impossible in many > cases. > > My work is now presentable at > > http://www.rossettiarchive.org/rose > > (rose is for ROsetti SEarch) > > This system is implicitly designed for academics who are delving into > Rossetti's work, so it may not be all that interesting for most of you. > Have fun and send me any interesting things you discover, especially > any issues you may encounter. > > Here are some numbers to give you a sense of what is going on > underneath... There are currently 4,983 XML files, totally about 110MB. > Without getting into a lot of details of the confusing domain, there > are basically 3 types of XML files (works, pictures, and transcripts). > It is important that there be case-sensitive and case-insensitive > searches. To accomplish that, a custom analyzer is used in two > different modes, one applying a LowerCaseFilter, and one not with the > same documents written to two different indexes. There is one > particular type of XML file that gets indexed as two different types of > documents (a specialized summary/header type). In this first set of > indexes, it is basically a one-to-one mapping of XML file to Lucene > Document (with one type being indexed twice in different ways) - all > said there are 5539 documents in each of the two main indexes. The > transcript type gets sliced into another set of original case and > lowercased indexes with each document in that index representing a > document division (a <div> element in the XML). There are 12326 > documents in each of these <div>-level indexes. All said, the 4 > indexes built total about 3GB in size - I'm storing several fields in > order to hit-highlight. Only one of these indexes is being hit at a > time - it depends on what parameters you use when querying for which > index is used. > > Lucene brought the search times into a usable, and impressive to the > scholars, state. The previous search solution often timed the browser > out! Search results now are in the milliseconds range. > > The amount of data is tiny compared to most usages of Lucene, but > things are getting interesting in other ways. There has been little > tuning in terms of ranking quality so far, but this is the next area of > work. There is one document type that is more important than the > others, and it is being boosted during indexing. There is now a > growing interest in tinkering with all the new knobs and dials that are > now possible. Putting in similar and more-like-this features are > desired and will be relatively straightforward to implement. I'm > currently using catch-all-aggregate-field technique for a default field > for QueryParser searching. Using a multi-field expansion is an area > that is desirable instead though. So, I've got my homework to do and > catch up on all the goodness that has been mentioned in this list > recently regarding all of these techniques. > > An area where I'd like to solicit more help from the community relates > to something akin to personalization. The scholars would like to be > able to tune results based on the role (such as "art historian") that > is searching the site. This would involve some type of training or > continual learning process so that someone searching feeds back > preferences implicitly for their queries by visiting the actual > documents that are of interest. Now that the scholars have seen what > is possible (I showed them the cool SearchMorph comparison page > searching Wikipedia for "rossetti"), they want more and more! > > So - here's where I'm soliciting feedback - who's doing these types of > things in the realm of Humanties? Where should we go from here in > terms of researching and applying the types of features dreamed about > here? How would you recommend implementing these types of features? > > I'd be happy to share more about what I've done under the covers. As > you may be able to tell, the web UI is Tapestry for the search and > results pages (though you won't be able to tell from the URL's you'll > see :). The UI was designed primarily by one of our very graphical/CSS > savvy post doc research associates, and was designed with the research > scholar in mind. I continue to push for the "free form" search box to > transcend structural search, but there are some good scholarly reasons > to have focused structural searching also. The documents beyond the > links in the search results were another area where I've applied some > elbow grease - they were originally dynamically generated with hideous > URL's through Tamino and a Saxon transformation servlet. These are > documents that _rarely_ change - so they are now being XSL'd into all > sorts of HTML views during a build process and statically served with > Apache with quite clean URLs (and Google friendly, which, > interestingly, is not something scholars care that much about for these > types of archives it seems). The XML files are accessible directly as > hyperlinks at the bottoms of the document pages too - so you can see > the parsing fun I've had (thank you JDOM and XPath!). > > Erik > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]