Possible, but from the profile I did it was time basically spent in the state machine logic and not newing tokens.
C --- Ben van Klinken <[EMAIL PROTECTED]> wrote: > This raises an interesting point and it's an issue that i think i > dealt with in CLucene. I modified the way the clucene tokenstream > works with some large performance increases. I change the tokenstream > interface to the following: > > from Token next(); to > boolean next(Token t); > > then the document writer can use 1 token object over and over again, > thus removing the need to create and destory tokens, which increased > performance dramatically. maybe java lucene could consider doing the > same thing? or maybe the performance increase is just applicable to > c++, you'd have to try :) > > just an idea ;) > > ben > > On 6/10/05, Chris Collins <[EMAIL PROTECTED]> wrote: > > Forwarding to the dev list as I dont know if this is usefull data....tell > me to > > shut up if it isnt. > > > > Chris > > Note: forwarded message attached. > > > > > > > > > > > > ---------- Forwarded message ---------- > > From: Chris Collins <[EMAIL PROTECTED]> > > To: [email protected], Bill Au <[EMAIL PROTECTED]> > > Date: Thu, 9 Jun 2005 21:58:14 -0700 (PDT) > > Subject: Re: Optimizing indexes with mulitiple processors? > > To follow up. I was surprised to find that from the experiment of indexing > 4k > > documents to local disk (Dell PE with onboard RAID with 256MB cache). I got > the > > following data from my profile: > > > > 70 % time was spent in inverting the document > > 30 % in merge > > > > Ok that part isnt surprising. However only about 1% of 30% of the merge > was > > spent in the OS.flush call (not very IO bound at all with this controller). > > And almost all of the invert was in the StandardAnalyzer pegged in the > javacc > > generated code. The profile was based upon duration and not cpu. The > profiler > > was JProbe. I was using a lower case analyzer and this was a slightly > hacked > > lucene-1.4.3 source code line that I swapped out some of the synchronized > data > > structures (hashtable ->hashmap, Vector->ArrayList). > > > > <<ChRiS>> > > > > --- Chris Collins <[EMAIL PROTECTED]> wrote: > > > > > I found with a fast RAID controller that I can easily be CPU bound, some > of > > > the > > > io is related to latency. You can hide the latency by having overlapping > IO > > > (you get that with multiple indexers going on at the same time). > > > > > > I think there possibly could be more horsepower you can get out of the > > > inverter > > > and merge aspects of the indexing. I am currently jprobeing this at the > > > moment. > > > > > > If your using high latency disks (such as a filer) during merge you may > want > > > to > > > consider increasing the size of the buffers to reduce the amount of rpc's > to > > > the filer....however my previous attempts to change this failed. > > > > > > C > > > > > > --- Bill Au <[EMAIL PROTECTED]> wrote: > > > > > > > Optimize is disk I/O bound. So I am not sure what multiple CPUs will > buy > > > > you. > > > > > > > > Bill > > > > > > > > On 6/9/05, Kevin Burton <[EMAIL PROTECTED]> wrote: > > > > > Is it possible to get Lucene to do an index optimize on multiple > > > > > processors? > > > > > > > > > > Its a single threaded algorithm currently right? > > > > > > > > > > Its a shame since I have a quad machine but I'm only using 1/4th of > the > > > > > capacity. Thats a heck of a performance hit. > > > > > > > > > > Kevin > > > > > > > > > > -- > > > > > > > > > > > > > > > Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com. > > > > > See irc.freenode.net #rojo if you want to chat. > > > > > > > > > > Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html > > > > > > > > > > Kevin A. Burton, Location - San Francisco, CA > > > > > AIM/YIM - sfburtonator, Web - http://peerfear.org/ > > > > > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
