Grant Ingersoll <[EMAIL PROTECTED]> wrote on 08/12/2007 16:02:31:

>
> On Dec 8, 2007, at 4:51 AM, Michael McCandless wrote:
>
> >>> Sometimes, when something like this comes up, it gives you the
> >>> opportunity to take a step back and ask what are the things we
> >>> really want Lucene to be going forward (the New Year is good for
> >>> this kind of assessment as well)  What are it's strengths and
> >>> weaknesses?  What can we improve in the short term and what needs
> >>> to improve in the longer term?  Maybe it's just that time of year
> >>> to send out your Lucene Wish List... :-)
> >
> > +1
> >
> > There is still something for us to learn & improve in Lucene, even
> > if the comparison is necessarily apples/oranges or unfair.
> >
> > Lucene was listed as not having "Result Excerpt" which isn't really
> > fair,  though it is true you have to pull in contrib/highlighter to
> > enable it.
>
> Yeah, I noted that mentally, but didn't think it was a big deal since
> not everyone wants it.  The other thing is, some of it comes down to
> how you structure your content.  I think a lot of people use metadata
> fields to provide enough "summary" info about a document.
>
> >
> >
> >> Did it crash on the 10 GB? I thought it said that it just took way
> >> to long (7 times the best or something). Frankly, either case is
> >> suspect. Last summer I indexed about 5 million docs with a total
> >> size at the *very* least of 10 GB on my 3 year old desktop. It
> >> didn't take much more than 8 hours to index and searches where
> >> still lightning fast. Maybe they forgot to give the JVM more than
> >> the default amount of RAM <g>
> >
> > The paper just said "ht://Dig and Lucene degraded considerably their
> > indexing time, and we excluded them from the final comparison".
> >
> > Maybe Lucene just hit a very large segment merge and the author
> > incorrectly thought something had gone wrong since the addDocument
> > call was taking incredibly long?  In which case the new default
> > ConcurrentMergeScheduler should improve that.  I would expect Lucene
> > 2.3 to now have an advantage in that it makes use of concurrency in
> > the hardware, out of the box, whereas likely other older engines are
> > single threaded.
>
> Yep.
>
> >
> >
> > I've also thought about creating a simple optional threaded layer on
> > top of IndexWriter which uses multiple threads to add documents,
> > under the hood.  Such a class would expose all of the methods of
> > IndexWriter (would feel just like IndexWriter), except calls to add/
> > updateDocument would drop into a queue which multiple threads
> > (maintained by this class) would pull from and execute.  This would
> > then let Lucene make use of even more concurrency ... and saves the
> > "complexity" of application writers having to manage threads above
> > Lucene.
>
> +1  I have been thinking about this too.  Solr clearly demonstrates
> the benefits of this kind of approach, although even it doesn't make
> it seamless for users in the sense that they still need to divvy up
> the docs on the app side.

Would be nice if this layer also took care of searchers/readers
refreshing & warming.

>
> Here's some of my wishes:
>
> 1. Better Demo
>
> 2. Alternate scoring algorithms (which implies indexing too) that
> perform at or near the same level as the current ones

+1

>
> 3. A way of announcing improvements to Interfaces such that we have
> better ability to add methods to interfaces, knowing full well it will
> break some people.  Same goes for deprecated.  In this day and age of
> agile programming, it seems a bit restrictive to me that we wait 1+
> years (the average time between major releases) to remove what we
> consider to be cruft in our code or add new capabilities to
> interfaces.  I would suggest we announce a deprecated method, version
> it, mark it to when it is going away (i.e. This will be removed in
> version 2.6) and then do so in that version.   So, if we deprecate
> something in 2.3, we could, assuming consecutive numbered releases,
> remove it in 2.5.  This would presumably move things up a bit to about
> the 6 mos. time range.  Just a thought...  :-)
>
> -Grant


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to