I'm not sure what all of the 'advanced features' were also. Phonetic Searching - probably not important to this application.
Synonym searching might be desirable, but now that I'm thinking about it, also likely not important. Associated Words - sounds very interesting, like 'gold' might return 'metal' also, etc. But Drill Down searching is very desirable. It's where you're able to search within the results of a previous search. I'm assuming that I'll have to implement that myself, by keeping a copy of the previous Hits list, and only returning results that are in both lists. Thanks very much for your reply. ----- Original Message ----- From: "Steven J. Owens" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, September 04, 2003 3:02 AM Subject: Re: Lucene features > On Wed, Sep 03, 2003 at 02:42:48PM -0400, Chris Sibert wrote: > > Lucene Users List <[EMAIL PROTECTED]> > > > > I am wondering if Lucene is the way to go for my project. > > > Probably. Tell us a little about your project. > > > > It's pretty basic. I'm just indexing 4 large text files, ranging up to 100MB > > in size. They don't ever change, and are on a CD-ROM. Each file contains a > > bunch of small documents. I just create one index for all 4 of them. These > > documents are for an association that I belong to - they contain a history > > of the association's documents - and my application allows you to search > > them. > > Well, aside from your concerns about the second list, Lucene > seems perfect for your needs. You'd parse apart the four big files > into a bunch of small documents, the parse those small documents and > create lucene Documents, containing Fields, and add them to the index. > > > They are actually currently indexed by an application called > > 'Sonar', by Virginia Systems. But I REALLY didn't like using their > > user interface - blech - so I decided to write a new interface for > > my own use. But Sonar costs some real bucks to be able to develop > > against their search API, so I found Lucene, and decided to go with > > it. > > > > Here are the search features that 'Sonar' has : > > Boolean Searching > > Proximity Searching > > Wild Card Searching > > Field/Block Searching > > I'm not sure what Field/Block means. Boolean, Proximity and > WildCard, are pretty typical in Lucene searches. You should probably > take a look at the Query Parser syntax docs: > > http://jakarta.apache.org/lucene/docs/queryparsersyntax.html > > > > Relevancy Ranking / Date Ranking > > Lucene search results are typically ranked by relevance, and you > can tweak the search to adjust this (there's a fair bit of discussion > of this in the lucene-user archives, a good keyword to look for is > "slop" and "boost"). > > Sorting output by date might take some finesse. I haven't played > with sorting by date, but I'd expect to handle that by directly > instantiating a QueryTerm to indicate the date issues. > > > List of Occurrences in Context > > I assume here that you mean displaying the results with a little > snapshot of the text around it. There have been discussions about how > best to do this (often focused around highlighting the search terms in > the displayed text) on the lucene-users list. Check the list archive. > > > Phonetic Searching > > I'd guess you need to build this one yourself, perhaps by using a > soundex algorithm when indexing the original data files. > > > Synonyms/Concepts > > Likewise... you'd need to come up with some sort of ontology of > synonyms and concepts, then parse the fields you're indexing and > generate a synonym/concept field that you'd add to the lucene > Document. > > > Relational Searching > > Associated Words > > Drill Down Search Narrowing > > I'm not sure what these three mean. > > > I think that Lucene has all the features in the first group. How does it > > stack up against the second group ? > > I'm afraid I haven't been too helpful here. Perhaps if you > clarify what the above mean, folks can post about how to implement it > in Lucene. > > > I'm writing the whole thing in Swing, which has been time consuming, > > and so have invested quite a bit of time into this project. But I'm > > seeing the end of the tunnel, and want to make sure that I'm going > > down the right path before I spend too much more time on it. > > It sounds like you ought to at least seriously consider using > Lucene, if you can find or implement equivalent features, or decide > you can live without them. > > -- > Steven J. Owens > [EMAIL PROTECTED] > > "I'm going to make broad, sweeping generalizations and strong, > declarative statements, because otherwise I'll be here all night and > this document will be four times longer and much less fun to read. > Take it all with a grain of salt." - Me at http://darksleep.com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]