I'm not sure what all of the 'advanced features' were also.

Phonetic Searching - probably not important to this application.

Synonym searching might be desirable, but now that I'm thinking about it,
also likely not important.

Associated Words - sounds very interesting, like 'gold' might return 'metal'
also, etc.

But Drill Down searching is very desirable. It's where you're able to search
within the results of a previous search. I'm assuming that I'll have to
implement that myself, by keeping a copy of the previous Hits list, and only
returning results that are in both lists.

Thanks very much for your reply.

----- Original Message ----- 
From: "Steven J. Owens" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, September 04, 2003 3:02 AM
Subject: Re: Lucene features


> On Wed, Sep 03, 2003 at 02:42:48PM -0400, Chris Sibert wrote:
> > Lucene Users List <[EMAIL PROTECTED]>
> > > > I am wondering if Lucene is the way to go for my project.
> > >      Probably.  Tell us a little about your project.
> >
> > It's pretty basic. I'm just indexing 4 large text files, ranging up to
100MB
> > in size. They don't ever change, and are on a CD-ROM. Each file contains
a
> > bunch of small documents. I just create one index for all 4 of them.
These
> > documents are for an association that I belong to - they contain a
history
> > of the association's documents - and my application allows you to search
> > them.
>
>      Well, aside from your concerns about the second list, Lucene
> seems perfect for your needs.  You'd parse apart the four big files
> into a bunch of small documents, the parse those small documents and
> create lucene Documents, containing Fields, and add them to the index.
>
> > They are actually currently indexed by an application called
> > 'Sonar', by Virginia Systems. But I REALLY didn't like using their
> > user interface - blech - so I decided to write a new interface for
> > my own use. But Sonar costs some real bucks to be able to develop
> > against their search API, so I found Lucene, and decided to go with
> > it.
> >
> > Here are the search features that 'Sonar' has :
> >   Boolean Searching
> >   Proximity Searching
> >   Wild Card Searching
> >   Field/Block Searching
>
>      I'm not sure what Field/Block means.  Boolean, Proximity and
> WildCard, are pretty typical in Lucene searches.  You should probably
> take a look at the Query Parser syntax docs:
>
>      http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
>
>
> >   Relevancy Ranking / Date Ranking
>
>      Lucene search results are typically ranked by relevance, and you
> can tweak the search to adjust this (there's a fair bit of discussion
> of this in the lucene-user archives, a good keyword to look for is
> "slop" and "boost").
>
>      Sorting output by date might take some finesse.  I haven't played
> with sorting by date, but I'd expect to handle that by directly
> instantiating a QueryTerm to indicate the date issues.
>
> >   List of Occurrences in Context
>
>      I assume here that you mean displaying the results with a little
> snapshot of the text around it.  There have been discussions about how
> best to do this (often focused around highlighting the search terms in
> the displayed text) on the lucene-users list.  Check the list archive.
>
> >   Phonetic Searching
>
>      I'd guess you need to build this one yourself, perhaps by using a
> soundex algorithm when indexing the original data files.
>
> >   Synonyms/Concepts
>
>      Likewise... you'd need to come up with some sort of ontology of
> synonyms and concepts, then parse the fields you're indexing and
> generate a synonym/concept field that you'd add to the lucene
> Document.
>
> >   Relational Searching
> >   Associated Words
> >   Drill Down Search Narrowing
>
>      I'm not sure what these three mean.
>
> > I think that Lucene has all the features in the first group. How does it
> > stack up against the second group ?
>
>      I'm afraid I haven't been too helpful here.  Perhaps if you
> clarify what the above mean, folks can post about how to implement it
> in Lucene.
>
> > I'm writing the whole thing in Swing, which has been time consuming,
> > and so have invested quite a bit of time into this project. But I'm
> > seeing the end of the tunnel, and want to make sure that I'm going
> > down the right path before I spend too much more time on it.
>
>      It sounds like you ought to at least seriously consider using
> Lucene, if you can find or implement equivalent features, or decide
> you can live without them.
>
> -- 
> Steven J. Owens
> [EMAIL PROTECTED]
>
> "I'm going to make broad, sweeping generalizations and strong,
>  declarative statements, because otherwise I'll be here all night and
>  this document will be four times longer and much less fun to read.
>  Take it all with a grain of salt." - Me at http://darksleep.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to