Re: Lucene in the Humanties

Luke Shannon Fri, 18 Feb 2005 12:20:39 -0800

Nice work Eric. I would like to spend more time playing with it, but I saw a
few things I really liked. When a specific query turns up no results you
prompt the client to preform a free form search. Less sauvy search users
will benefit from this strategy. I also like the display of information when
you select a result. Everything is at your finger tips without clutter.


I did get this error when a name search failed to turn up results and I
clicked 'help' in the free form search row (the second row).

Here is my browser info:

Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5) Gecko/20041107
Firefox/1.0

Below are the details from the error:

 Page 'help-freeform.html' not found in application namespace.

 Stack Trace:
  a..
org.apache.tapestry.resolver.PageSpecificationResolver.resolve(PageSpecifica
tionResolver.java:120)
  b.. org.apache.tapestry.pageload.PageSource.getPage(PageSource.java:144)
  c.. org.apache.tapestry.engine.RequestCycle.getPage(RequestCycle.java:195)
  d.. org.apache.tapestry.engine.PageService.service(PageService.java:73)
  e..
org.apache.tapestry.engine.AbstractEngine.service(AbstractEngine.java:872)
  f..
org.apache.tapestry.ApplicationServlet.doService(ApplicationServlet.java:197
)
  g..
org.apache.tapestry.ApplicationServlet.doGet(ApplicationServlet.java:158)
  h.. javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
  i.. javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
  j..
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:247)
  k..
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:193)
  l..
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:256)
  m..
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
  n..
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
  o.. org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
  p..
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:191)
  q..
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
  r..
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
  s.. org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
  t..
org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2422)
  u..
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180
)
  v..
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
  w..
org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.
java:171)
  x..
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
  y..
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:163
)
  z..
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
  aa..
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
  ab.. org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
  ac..
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:174)
  ad..
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
  ae..
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
  af.. org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
  ag..
org.apache.ajp.tomcat4.Ajp13Processor.process(Ajp13Processor.java:457)
  ah.. org.apache.ajp.tomcat4.Ajp13Processor.run(Ajp13Processor.java:576)
  ai.. java.lang.Thread.run(Thread.java:534)

Luke

----- Original Message ----- 
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene User" <lucene-user@jakarta.apache.org>
Sent: Friday, February 18, 2005 2:46 PM
Subject: Lucene in the Humanties


> It's about time I actually did something real with Lucene....  :)
>
> I have been working with the Applied Research in Patacriticism group at
> the University of Virginia for a few months and finally ready to
> present what I've been doing.  The primary focus of my group is working
> with the Rossetti Archive - poems, artwork, interpretations,
> collections, and so on of Dante Gabriel Rossetti.  I was initially
> brought on to build a collection and exhibit system, though I got
> detoured a bit as I got involved in applying Lucene to the archive to
> replace their existing search system.  The existing system used an old
> version of Tamino with XPath queries.  Tamino is not at fault here, at
> least not entirely, because our data is in a very complicated set of
> XML files with a lot of non-normalized and legacy metadata - getting at
> things via XPath is challenging and practically impossible in many
> cases.
>
> My work is now presentable at
>
> http://www.rossettiarchive.org/rose
>
> (rose is for ROsetti SEarch)
>
> This system is implicitly designed for academics who are delving into
> Rossetti's work, so it may not be all that interesting for most of you.
>   Have fun and send me any interesting things you discover, especially
> any issues you may encounter.
>
> Here are some numbers to give you a sense of what is going on
> underneath... There are currently 4,983 XML files, totally about 110MB.
>   Without getting into a lot of details of the confusing domain, there
> are basically 3 types of XML files (works, pictures, and transcripts).
> It is important that  there be case-sensitive and case-insensitive
> searches.  To accomplish that, a custom analyzer is used in two
> different modes, one applying a LowerCaseFilter, and one not with the
> same documents written to two different indexes.  There is one
> particular type of XML file that gets indexed as two different types of
> documents (a specialized summary/header type).  In this first set of
> indexes, it is basically a one-to-one mapping of XML file to Lucene
> Document (with one type being indexed twice in different ways) - all
> said there are 5539 documents in each of the two main indexes.  The
> transcript type gets sliced into another set of original case and
> lowercased indexes with each document in that index representing a
> document division (a <div> element in the XML).  There are 12326
> documents in each of these <div>-level indexes.   All said, the 4
> indexes built total about 3GB in size - I'm storing several fields in
> order to hit-highlight.  Only one of these indexes is being hit at a
> time - it depends on what parameters you use when querying for which
> index is used.
>
> Lucene brought the search times into a usable, and impressive to the
> scholars, state.  The previous search solution often timed the browser
> out!  Search results now are in the milliseconds range.
>
> The amount of data is tiny compared to most usages of Lucene, but
> things are getting interesting in other ways.   There has been little
> tuning in terms of ranking quality so far, but this is the next area of
> work.  There is one document type that is more important than the
> others, and it is being boosted during indexing.  There is now a
> growing interest in tinkering with all the new knobs and dials that are
> now possible.  Putting in similar and more-like-this features are
> desired and will be relatively straightforward to implement.  I'm
> currently using catch-all-aggregate-field technique for a default field
> for QueryParser searching.  Using a multi-field expansion is an area
> that is desirable instead though.  So, I've got my homework to do and
> catch up on all the goodness that has been mentioned in this list
> recently regarding all of these techniques.
>
> An area where I'd like to solicit more help from the community relates
> to something akin to personalization.  The scholars would like to be
> able to tune results based on the role (such as "art historian") that
> is searching the site.  This would involve some type of training or
> continual learning process so that someone searching feeds back
> preferences implicitly for their queries by visiting the actual
> documents that are of interest.  Now that the scholars have seen what
> is possible (I showed them the cool SearchMorph comparison page
> searching Wikipedia for "rossetti"), they want more and more!
>
> So - here's where I'm soliciting feedback - who's doing these types of
> things in the realm of Humanties?  Where should we go from here in
> terms of researching and applying the types of features dreamed about
> here?    How would you recommend implementing these types of features?
>
> I'd be happy to share more about what I've done under the covers.  As
> you may be able to tell, the web UI is Tapestry for the search and
> results pages (though you won't be able to tell from the URL's you'll
> see :).  The UI was designed primarily by one of our very graphical/CSS
> savvy post doc research associates, and was designed with the research
> scholar in mind.  I continue to push for the "free form" search box to
> transcend structural search, but there are some good scholarly reasons
> to have focused structural searching also.  The documents beyond the
> links in the search results were another area where I've applied some
> elbow grease - they were originally dynamically generated with hideous
> URL's through Tamino and a Saxon transformation servlet.  These are
> documents that _rarely_ change - so they are now being XSL'd into all
> sorts of HTML views during a build process and statically served with
> Apache with quite clean URLs (and Google friendly, which,
> interestingly, is not something scholars care that much about for these
> types of archives it seems).  The XML files are accessible directly as
> hyperlinks at the bottoms of the document pages too - so you can see
> the parsing fun I've had (thank you JDOM and XPath!).
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene in the Humanties

Reply via email to