Re: Slow Commits
Just an update to the list. It appears that memory was the culprit. I attached a JMX console to the running Tomcat instance and monitored memory usage. Used Total memory stayed ~900MB till a commit then jumped to m Xmx setting of 1.2GB where the "peak" flatlined and fell down likely after an OOM exception. I upped the Xmx to 2GB and commits are happening much better - in the 1 minute range. Jim Jim Murphy wrote: > > Thanks Jerome, > > > 1. I have shut off autowarming by setting params to 0. > 2. My JVM Settings: -Xmx1200m -Xms1200m -XX:-UseGCOverheadLimit > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=50 > 3. I am using autocommits - every 6 ms. But the commit blocks all the > master request threadpool threads as it spends 2-3 minutes commiting. > 4. I'm reluctant to NOT waitFlush since I don't want commits stacking up. > > > Any other thoughts? > > Thanks > > Jim > > > > > Jérôme Etévé wrote: >> >> Hi, here's two thing that can slow down commits: >> >> 1) Autowarming the caches. >> 2) The Java old generation object garbage collection. >> >> You can try: >> - Turning autowarming off (set autowarmCount="0" in the caches >> configuration) >> - If you use the sun jvm, use -XX:+UseConcMarkSweepGC to get a less >> blocking garbage collection. >> >> You may also try to: >> - Not wait for the new searcher when you commit. The commit will then >> be instant from your posting application point of view. ( option >> waitSearcher=false ). >> - Leave the commits to the server ( by setting autocommits in the >> solrconfig.xml). This is the best strategy if you've got lot of >> concurrent processes who posts. >> >> Cheers. >> >> Jerome. >> >> 2009/10/28 Jim Murphy : >>> >>> Hi All, >>> >>> We have 8 solr shards, index is ~ 90M documents 190GB. :) >>> >>> 4 of the shards have acceptable commit time - 30-60 seconds. The other >>> 4 >>> have drifted over the last couple months to but up around 2-3 minutes. >>> This >>> is killing our write throughput as you can imagine. >>> >>> I've included a log dump of a typical commit. Not the large time period >>> (3:40) between the start commit log message and the OnCommit log >>> message. >>> So, I think warming issues are not relevant. >>> >>> Any ideas what to debug at this point? >>> >>> I'm about to issue an optimize and see where that goes. Its been a >>> while >>> since I did that. >>> >>> Cheers, >>> >>> Jim >>> >>> >>> >>> >>> Oct 28, 2009 11:47:02 AM org.apache.solr.update.DirectUpdateHandler2 >>> commit >>> INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) >>> Oct 28, 2009 11:50:43 AM org.apache.solr.core.SolrDeletionPolicy >>> onCommit >>> INFO: SolrDeletionPolicy.onCommit: commits:num=2 >>> >>> commit{dir=/master/data/index,segFN=segments_8us4,version=1228872482131,generation=413140,filenames=[segments_8us4, >>> _alae.fnm, _ai >>> lk.tis, _ala9.fnm, _ala9.fdx, _alac.fnm, _al9w_h.del, _alab.prx, >>> _ala9.fdt, >>> _a61p_b76.del, _alab.fnm, _al8x.frq, _al7i_2f.del, _akh1.tis, >>> _add1.frq, _alae.tis, _alad_1.del, _alaa.fnm, _alad.nrm, _al9w.frq, >>> _alae.tii, _ailk.tii, _add1.tis, _alac.tii, _akuu.tis, _add1.tii, _ail >>> k.frq, _alac.tis, _7zfh.tii, _962y.tis, _ala7.frq, _ah91.prx, _akuu.tii, >>> _alab_3.del, _ah91.fnm, _7zfh.tis, _ala8.frq, _962y.tii, _alae.pr >>> x, _a61p.fdt, _akuu.frq, _a61p.fdx, _al7i.fdx, _al2o.tis, _al9w.tis, >>> _ala7.fnm, _a61p.frq, _akzu.fnm, _9wzn.fnm, _akh1.prx, _al7i.fdt, _al >>> a9_2.del, _962y.prx, _al7i.prx, _al9w.tii, _alaa_4.del, _al7i.frq, >>> _ah91.tii, _ala8.nrm, _962y.fdt, _add1_62u.del, _alae.nrm, _ah91.tis, _ >>> 962y.fdx, _akh1.fnm, _al8x.prx, _al2o.tii, _ala7.fdx, _ala9.prx, >>> _ala7.fdt, >>> _al9w.prx, _ala8.prx, _akh1.tii, _al2o.fdx, _7zfh.frq, _alac_3 >>> .del, _akzu.tii, _akzu.fdt, _alad.fnm, _akzu.tis, _alab.nrm, _akzu.fdx, >>> _al2o.fnm, _al2o.fdt, _alaa.prx, _alaa.nrm, _962y.fnm, _ala7.prx, >>> _alaa.tis, _ailk.fdt, _akzu_8d.del, _alac.frq, _akzu.prx, _ala9.nrm, >>> _ailk.prx, _ala9.tis, _alaa.tii, _alae.frq, _add1.fnm, _7zfh.prx, _al >>> 9w.fnm, _ala9.tii, _ala9.frq, _962y.nrm, _alab.frq, _ala8.fdx, >>> _al8x.fnm, >>> _a61p.prx, _7zfh.fnm, _ala8.fdt, _ailk.fdx, _
Re: Slow Commits
Thanks Jerome, 1. I have shut off autowarming by setting params to 0. 2. My JVM Settings: -Xmx1200m -Xms1200m -XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=50 3. I am using autocommits - every 6 ms. But the commit blocks all the master request threadpool threads as it spends 2-3 minutes commiting. 4. I'm reluctant to NOT waitFlush since I don't want commits stacking up. Any other thoughts? Thanks Jim Jérôme Etévé wrote: > > Hi, here's two thing that can slow down commits: > > 1) Autowarming the caches. > 2) The Java old generation object garbage collection. > > You can try: > - Turning autowarming off (set autowarmCount="0" in the caches > configuration) > - If you use the sun jvm, use -XX:+UseConcMarkSweepGC to get a less > blocking garbage collection. > > You may also try to: > - Not wait for the new searcher when you commit. The commit will then > be instant from your posting application point of view. ( option > waitSearcher=false ). > - Leave the commits to the server ( by setting autocommits in the > solrconfig.xml). This is the best strategy if you've got lot of > concurrent processes who posts. > > Cheers. > > Jerome. > > 2009/10/28 Jim Murphy : >> >> Hi All, >> >> We have 8 solr shards, index is ~ 90M documents 190GB. :) >> >> 4 of the shards have acceptable commit time - 30-60 seconds. The other 4 >> have drifted over the last couple months to but up around 2-3 minutes. >> This >> is killing our write throughput as you can imagine. >> >> I've included a log dump of a typical commit. Not the large time period >> (3:40) between the start commit log message and the OnCommit log message. >> So, I think warming issues are not relevant. >> >> Any ideas what to debug at this point? >> >> I'm about to issue an optimize and see where that goes. Its been a while >> since I did that. >> >> Cheers, >> >> Jim >> >> >> >> >> Oct 28, 2009 11:47:02 AM org.apache.solr.update.DirectUpdateHandler2 >> commit >> INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) >> Oct 28, 2009 11:50:43 AM org.apache.solr.core.SolrDeletionPolicy onCommit >> INFO: SolrDeletionPolicy.onCommit: commits:num=2 >> >> commit{dir=/master/data/index,segFN=segments_8us4,version=1228872482131,generation=413140,filenames=[segments_8us4, >> _alae.fnm, _ai >> lk.tis, _ala9.fnm, _ala9.fdx, _alac.fnm, _al9w_h.del, _alab.prx, >> _ala9.fdt, >> _a61p_b76.del, _alab.fnm, _al8x.frq, _al7i_2f.del, _akh1.tis, >> _add1.frq, _alae.tis, _alad_1.del, _alaa.fnm, _alad.nrm, _al9w.frq, >> _alae.tii, _ailk.tii, _add1.tis, _alac.tii, _akuu.tis, _add1.tii, _ail >> k.frq, _alac.tis, _7zfh.tii, _962y.tis, _ala7.frq, _ah91.prx, _akuu.tii, >> _alab_3.del, _ah91.fnm, _7zfh.tis, _ala8.frq, _962y.tii, _alae.pr >> x, _a61p.fdt, _akuu.frq, _a61p.fdx, _al7i.fdx, _al2o.tis, _al9w.tis, >> _ala7.fnm, _a61p.frq, _akzu.fnm, _9wzn.fnm, _akh1.prx, _al7i.fdt, _al >> a9_2.del, _962y.prx, _al7i.prx, _al9w.tii, _alaa_4.del, _al7i.frq, >> _ah91.tii, _ala8.nrm, _962y.fdt, _add1_62u.del, _alae.nrm, _ah91.tis, _ >> 962y.fdx, _akh1.fnm, _al8x.prx, _al2o.tii, _ala7.fdx, _ala9.prx, >> _ala7.fdt, >> _al9w.prx, _ala8.prx, _akh1.tii, _al2o.fdx, _7zfh.frq, _alac_3 >> .del, _akzu.tii, _akzu.fdt, _alad.fnm, _akzu.tis, _alab.nrm, _akzu.fdx, >> _al2o.fnm, _al2o.fdt, _alaa.prx, _alaa.nrm, _962y.fnm, _ala7.prx, >> _alaa.tis, _ailk.fdt, _akzu_8d.del, _alac.frq, _akzu.prx, _ala9.nrm, >> _ailk.prx, _ala9.tis, _alaa.tii, _alae.frq, _add1.fnm, _7zfh.prx, _al >> 9w.fnm, _ala9.tii, _ala9.frq, _962y.nrm, _alab.frq, _ala8.fdx, _al8x.fnm, >> _a61p.prx, _7zfh.fnm, _ala8.fdt, _ailk.fdx, _alaa.frq, _7zfh.fdx >> , _al7i.tis, _ah91.fdt, _ailk.fnm, _9wzn_i0m.del, _ah91.fdx, _al7i.tii, >> _ailk_24j.del, _alad.fdx, _al8x.tii, _alae.fdx, _add1.prx, _akuu.f >> nm, _al8x.tis, _ah91.frq, _ala8.fnm, _7zfh.fdt, _alad.fdt, _alae_1.del, >> _alae.fdt, _akzu.frq, _a61p.fnm, _9wzn.frq, _ala8.tii, _7zfh_1gsd. >> del, _7zfh.nrm, _ala7_6.del, _a61p.tis, _9wzn.tii, _alad.frq, _alad.tii, >> _akuu.fdt, _alab.tii, _ala8.tis, _962y_xgg.del, _akh1.frq, _akuu. >> fdx, _alab.tis, _al7i.fnm, _alad.tis, _alac.nrm, _alab.fdx, _ala8_5.del, >> _add1.fdx, _ala7.tii, _akuu_cc.del, _alab.fdt, _9wzn.prx, _alaa.f >> dx, _al9w.fdt, _al2o.frq, _akh1_nf.del, _alac.prx, _akh1.fdx, _alaa.fdt, >> _al9w.fdx, _al8x_17.del, _add1.fdt, _al2o.prx, _akh1.fdt, _alad.p >> rx, _akuu.prx, _962y.frq, _al2o_66.del, _alac.fdt, _ala7.tis, _a61p.tii, >> _alac.fdx, _al8x.fdt, _9wzn.tis,
Slow Commits
Hi All, We have 8 solr shards, index is ~ 90M documents 190GB. :) 4 of the shards have acceptable commit time - 30-60 seconds. The other 4 have drifted over the last couple months to but up around 2-3 minutes. This is killing our write throughput as you can imagine. I've included a log dump of a typical commit. Not the large time period (3:40) between the start commit log message and the OnCommit log message. So, I think warming issues are not relevant. Any ideas what to debug at this point? I'm about to issue an optimize and see where that goes. Its been a while since I did that. Cheers, Jim Oct 28, 2009 11:47:02 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) Oct 28, 2009 11:50:43 AM org.apache.solr.core.SolrDeletionPolicy onCommit INFO: SolrDeletionPolicy.onCommit: commits:num=2 commit{dir=/master/data/index,segFN=segments_8us4,version=1228872482131,generation=413140,filenames=[segments_8us4, _alae.fnm, _ai lk.tis, _ala9.fnm, _ala9.fdx, _alac.fnm, _al9w_h.del, _alab.prx, _ala9.fdt, _a61p_b76.del, _alab.fnm, _al8x.frq, _al7i_2f.del, _akh1.tis, _add1.frq, _alae.tis, _alad_1.del, _alaa.fnm, _alad.nrm, _al9w.frq, _alae.tii, _ailk.tii, _add1.tis, _alac.tii, _akuu.tis, _add1.tii, _ail k.frq, _alac.tis, _7zfh.tii, _962y.tis, _ala7.frq, _ah91.prx, _akuu.tii, _alab_3.del, _ah91.fnm, _7zfh.tis, _ala8.frq, _962y.tii, _alae.pr x, _a61p.fdt, _akuu.frq, _a61p.fdx, _al7i.fdx, _al2o.tis, _al9w.tis, _ala7.fnm, _a61p.frq, _akzu.fnm, _9wzn.fnm, _akh1.prx, _al7i.fdt, _al a9_2.del, _962y.prx, _al7i.prx, _al9w.tii, _alaa_4.del, _al7i.frq, _ah91.tii, _ala8.nrm, _962y.fdt, _add1_62u.del, _alae.nrm, _ah91.tis, _ 962y.fdx, _akh1.fnm, _al8x.prx, _al2o.tii, _ala7.fdx, _ala9.prx, _ala7.fdt, _al9w.prx, _ala8.prx, _akh1.tii, _al2o.fdx, _7zfh.frq, _alac_3 .del, _akzu.tii, _akzu.fdt, _alad.fnm, _akzu.tis, _alab.nrm, _akzu.fdx, _al2o.fnm, _al2o.fdt, _alaa.prx, _alaa.nrm, _962y.fnm, _ala7.prx, _alaa.tis, _ailk.fdt, _akzu_8d.del, _alac.frq, _akzu.prx, _ala9.nrm, _ailk.prx, _ala9.tis, _alaa.tii, _alae.frq, _add1.fnm, _7zfh.prx, _al 9w.fnm, _ala9.tii, _ala9.frq, _962y.nrm, _alab.frq, _ala8.fdx, _al8x.fnm, _a61p.prx, _7zfh.fnm, _ala8.fdt, _ailk.fdx, _alaa.frq, _7zfh.fdx , _al7i.tis, _ah91.fdt, _ailk.fnm, _9wzn_i0m.del, _ah91.fdx, _al7i.tii, _ailk_24j.del, _alad.fdx, _al8x.tii, _alae.fdx, _add1.prx, _akuu.f nm, _al8x.tis, _ah91.frq, _ala8.fnm, _7zfh.fdt, _alad.fdt, _alae_1.del, _alae.fdt, _akzu.frq, _a61p.fnm, _9wzn.frq, _ala8.tii, _7zfh_1gsd. del, _7zfh.nrm, _ala7_6.del, _a61p.tis, _9wzn.tii, _alad.frq, _alad.tii, _akuu.fdt, _alab.tii, _ala8.tis, _962y_xgg.del, _akh1.frq, _akuu. fdx, _alab.tis, _al7i.fnm, _alad.tis, _alac.nrm, _alab.fdx, _ala8_5.del, _add1.fdx, _ala7.tii, _akuu_cc.del, _alab.fdt, _9wzn.prx, _alaa.f dx, _al9w.fdt, _al2o.frq, _akh1_nf.del, _alac.prx, _akh1.fdx, _alaa.fdt, _al9w.fdx, _al8x_17.del, _add1.fdt, _al2o.prx, _akh1.fdt, _alad.p rx, _akuu.prx, _962y.frq, _al2o_66.del, _alac.fdt, _ala7.tis, _a61p.tii, _alac.fdx, _al8x.fdt, _9wzn.tis, _9wzn.fdt, _al8x.fdx, _9wzn.fdx, _ah91_35l.del] commit{dir=/master/data/index,segFN=segments_8us5,version=1228872482132,generation=413141,filenames=[_ala9.fnm, _alaa_5.del, _alab .fnm, _962y_xgh.del, _al8x.frq, _akh1.tis, _add1.frq, _alae.tis, _7zfh_1gse.del, _alad.nrm, _alae.tii, _akuu.tis, _ah91_35m.del, _ailk.frq , _7zfh.tii, _962y.tis, _akuu.tii, _ah91.prx, _7zfh.tis, _ala8.frq, _962y.tii, _ala7.fnm, _akzu.fnm, _9wzn.fnm, _ala9_2.del, _ala8.nrm, _a laf.fnm, _alae.nrm, _ala9.prx, _ailk_24k.del, _alaf.prx, _al9w.prx, _ala8.prx, _akh1.tii, _akzu.tii, _akzu.tis, _alad.fnm, _al2o.fnm, _962 y.fnm, _al8x_18.del, _ala7_7.del, _alaa.tis, _ala9.nrm, _ala9.tis, _alaa.tii, _962y.nrm, _ala9.tii, _a61p.prx, _add1_62v.del, _al8x.fnm, _ 7zfh.fnm, _al7i_2g.del, _ailk.fnm, _al8x.tii, _al8x.tis, _ala8.fnm, _akzu.frq, _9wzn.frq, _7zfh.nrm, _akuu.fdt, _alad.tii, _akuu.fdx, _aku u_cd.del, _a61p_b77.del, _alad.tis, _al2o_67.del, _add1.fdx, _9wzn.prx, _al9w.fdt, _add1.fdt, _al9w.fdx, _akuu.prx, _962y.frq, _9wzn.fdt, _alab_4.del, _9wzn.fdx, segments_8us5, _alac_4.del, _alae.fnm, _ailk.tis, _ala9.fdx, _alac.fnm, _ala9.fdt, _alab.prx, _alae_2.del, _alaa.f nm, _alad_1.del, _al9w.frq, _ailk.tii, _add1.tis, _alac.tii, _add1.tii, _alac.tis, _ala7.frq, _ah91.fnm, _a61p.fdt, _alae.prx, _akuu.frq, _a61p.fdx, _akh1_ng.del, _al7i.fdx, _al2o.tis, _al9w.tis, _a61p.frq, _akh1.prx, _9wzn_i0n.del, _al7i.fdt, _al7i.prx, _962y.prx, _al9w.tii, _al7i.frq, _ah91.tii, _962y.fdt, _akh1.fnm, _962y.fdx, _ah91.tis, _al8x.prx, _al2o.tii, _ala7.fdx, _ala7.fdt, _alaf.fdx, _alaf.fdt, _al2o .fdx, _7zfh.frq, _akzu.fdt, _alaf.nrm, _akzu.fdx, _alab.nrm, _al2o.fdt, _alaa.prx, _alaa.nrm, _ala7.prx, _ailk.fdt, _akzu.prx, _alac.frq, _ailk.prx, _alaf.tii, _alaf_1.del, _alae.frq, _add1.fnm, _alaf.tis, _7zfh.prx, _al9w.fnm, _ala9.frq, _alab.frq, _ala8.fdx, _akzu_8e.del, _ ala8.fdt, _ailk.fdx, _alaa.frq, _al7i.tis, _7zfh.fdx, _al9w_i.del, _ah91.fdt, _a
Re: Autocommit blocking adds? AutoCommit Speedup?
Yonik Seeley-2 wrote: > > ...your code snippit elided and edited below ... > Don't take this code as correct (or even compiling) but is this the essence? I moved shared access to the writer inside the read lock and kept the other non-commit bits to the write lock. I'd need to rethink the locking in a more fundamental way but is this close to idea? public void commit(CommitUpdateCommand cmd) throws IOException { if (cmd.optimize) { optimizeCommands.incrementAndGet(); } else { commitCommands.incrementAndGet(); } Future[] waitSearcher = null; if (cmd.waitSearcher) { waitSearcher = new Future[1]; } boolean error=true; iwCommit.lock(); try { log.info("start "+cmd); if (cmd.optimize) { closeSearcher(); openWriter(); writer.optimize(cmd.maxOptimizeSegments); } finally { iwCommit.unlock(); } iwAccess.lock(); try { writer.commit(); } finally { iwAccess.unlock(); } iwCommit.lock(); try { callPostCommitCallbacks(); if (cmd.optimize) { callPostOptimizeCallbacks(); } // open a new searcher in the sync block to avoid opening it // after a deleteByQuery changed the index, or in between deletes // and adds of another commit being done. core.getSearcher(true,false,waitSearcher); // reset commit tracking tracker.didCommit(); log.info("end_commit_flush"); error=false; } finally { iwCommit.unlock(); addCommands.set(0); deleteByIdCommands.set(0); deleteByQueryCommands.set(0); numErrors.set(error ? 1 : 0); } // if we are supposed to wait for the searcher to be registered, then we should do it // outside of the synchronized block so that other update operations can proceed. if (waitSearcher!=null && waitSearcher[0] != null) { try { waitSearcher[0].get(); } catch (InterruptedException e) { SolrException.log(log,e); } catch (ExecutionException e) { SolrException.log(log,e); } } } -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23454419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit blocking adds? AutoCommit Speedup?
Any pointers to this newer more concurrent behavior in lucene? I can try an experiment where I downgrade the iwCommit lock to the iwAccess lock to allow updates to happen during commit. Would you expect that to work? Thanks for bootstrapping me on this. Jim Yonik Seeley-2 wrote: > > On Thu, May 7, 2009 at 8:37 PM, Jim Murphy wrote: >> Interesting. So is there a JIRA ticket open for this already? Any chance >> of >> getting it into 1.4? > > No ticket currently open, but IMO it could make it for 1.4. > >> Its seriously kicking out butts right now. We write >> into our masters with ~50ms response times till we hit the autocommit >> then >> add/update response time is 10-30 seconds. Ouch. > > It's probably been made a little worse lately since Lucene now does > fsync on index files before writing the segments file that points to > those files. A necessary evil to prevent index corruption. > >> I'd be willing to work on submitting a patch given a better understanding >> of >> the issue. > > Great, go for it! > > -Yonik > http://www.lucidimagination.com > > -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23452011.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit blocking adds? AutoCommit Speedup?
Interesting. So is there a JIRA ticket open for this already? Any chance of getting it into 1.4? Its seriously kicking out butts right now. We write into our masters with ~50ms response times till we hit the autocommit then add/update response time is 10-30 seconds. Ouch. I'd be willing to work on submitting a patch given a better understanding of the issue. Jim -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23438134.html Sent from the Solr - User mailing list archive at Nabble.com.
Autocommit blocking adds? AutoCommit Speedup?
Question 1: I see in DirectUpdateHandler2 that there is a read/Write lock used between addDoc and commit. My mental model of the process was this: clients can add/update documents until the auto commit threshold was hit. At that point the commit tracker would schedule a background commit. The commit would run and NOT BLOCK subsequent adds. clearly thast not happening because when the autocommit background thread runs it gets the iwCommit lock blocking anyone in addDoc trying to get iwAccess lock. Is this just the way it is or is it possible to configure Solr to process the pending documents int he background, queuing new documents in memory as before. Question 2: I ask this question because autocommits are taking a LONG time to complete, like 10-25 seconds. I have a 40M document index many 10s of GBs. What can I do to speed this up? Thanks Jim -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23435224.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tomcat 6 HTTP Connector Threads all blocked
pplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Interpreted frame) - org.apache.catalina.core.StandardWrapperValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=804, line=233 (Interpreted frame) - org.apache.catalina.core.StandardContextValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=285, line=175 (Interpreted frame) - org.apache.catalina.core.StandardHostValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=64, line=128 (Interpreted frame) - org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=6, line=102 (Interpreted frame) - org.apache.catalina.valves.AccessLogValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=24, line=563 (Interpreted frame) - org.apache.catalina.core.StandardEngineValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=42, line=109 (Interpreted frame) - org.apache.catalina.connector.CoyoteAdapter.service(org.apache.coyote.Request, org.apache.coyote.Response) @bci=157, line=263 (Interpreted frame) - org.apache.coyote.http11.Http11Processor.process(java.net.Socket) @bci=432, line=844 (Interpreted frame) - org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(java.net.Socket) @bci=82, line=584 (Interpreted frame) - org.apache.tomcat.util.net.JIoEndpoint$Worker.run() @bci=41, line=447 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=619 (Interpreted frame) Yonik Seeley-2 wrote: > > On Sun, Mar 1, 2009 at 10:32 AM, Jim Murphy wrote: >> I should have said - tomcat is hosting 2 webapps a solr 1.3 master and >> slave >> - as separate web apps. > > Given the the socket writes are blocked, it appears like whatever is > supposed to be reading the other endpoint isn't doing it's job. > > Are you using java-based replication? Do you know if these sockets > that are blocking are from client queries or from replication > requests? Splitting up the master and slave into separate JVMs might > help shed some light on the situation. > > -Yonik > http://www.lucidimagination.com > > -- View this message in context: http://www.nabble.com/Tomcat-6-HTTP-Connector-Threads-all-blocked-tp22274107p22278035.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tomcat 6 HTTP Connector Threads all blocked
I should have said - tomcat is hosting 2 webapps a solr 1.3 master and slave - as separate web apps. Looking for anything to try. Jim Jim Murphy wrote: > > I have a 100 thread HTTP connector pool that for some reason ends up with > all its threads blocked here: > > java.net.SocketOutputStream.socketWrite0(Native Method) > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > java.net.SocketOutputStream.write(SocketOutputStream.java:136) > org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:737) > org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:434) > org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:349) > org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:761) > org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:126) > org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:570) > org.apache.coyote.Response.doWrite(Response.java:560) > org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:353) > org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:434) > org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:309) > org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:273) > org.apache.catalina.connector.Response.finishResponse(Response.java:486) > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:287) > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584) > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > java.lang.Thread.run(Thread.java:619) > > > Any hints on what to try to diagnose? > > Regards > > Jim > -- View this message in context: http://www.nabble.com/Tomcat-6-HTTP-Connector-Threads-all-blocked-tp22274107p22274129.html Sent from the Solr - User mailing list archive at Nabble.com.
Tomcat 6 HTTP Connector Threads all blocked
I have a 100 thread HTTP connector pool that for some reason ends up with all its threads blocked here: java.net.SocketOutputStream.socketWrite0(Native Method) java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) java.net.SocketOutputStream.write(SocketOutputStream.java:136) org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:737) org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:434) org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:349) org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:761) org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:126) org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:570) org.apache.coyote.Response.doWrite(Response.java:560) org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:353) org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:434) org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:309) org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:273) org.apache.catalina.connector.Response.finishResponse(Response.java:486) org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:287) org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584) org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) java.lang.Thread.run(Thread.java:619) Any hints on what to try to diagnose? Regards Jim -- View this message in context: http://www.nabble.com/Tomcat-6-HTTP-Connector-Threads-all-blocked-tp22274107p22274107.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrconfig clarification: ColdSearcher/MaxWarmingSearcher
Thanks for the clarification and for untangling my questions. :) I'm in the process of finding out why our snapshot installs take so long to commit and didn't feel so confident about my settings, thanks. In terms of long snapshot commits - I've isolated it to long warming times. But since the warming query I use is of the same basic layout as "the" query we do at runtime I'm not sure what to do. At the moment I'm trying to isolate where the time is spent, if its just in pre-allocating large arrays and other data structures like the FieldSortedHitQueue - because my queries tend to be date sorted... Index Facts: 1. 22M documents 2. Snapshot installs to slave: 5 minutes 3. Warmup times: ~200-400 seconds hence the backup on warmers Problem with the backup searchers is that I run my JVM out of heap space if more than one searcher is warming up int he background. Continuing to profile commit/warming queries. Any helpful hints would be much appreciated. Cheers, Jim -- View this message in context: http://www.nabble.com/Solrconfig-clarification%3A-ColdSearcher-MaxWarmingSearcher-tp20904462p21003725.html Sent from the Solr - User mailing list archive at Nabble.com.
Solrconfig clarification: ColdSearcher/MaxWarmingSearcher
I have a cluster of Solr Master/Slaves. We write tot he master and replicate to the slaves via rsync. Master: 1. Replication is every 5 minutes. 2. Inserting many 100's docs per minute 3. Index is: 23 million documents 4. commits are every 30 seconds Slave: 1. Pre-warmed after rsync snapshot takes ~50 seconds 2. many queries per second So given that how should I setup the following searcher configs: false 5 Here's what I'm thinking: For the Master: I don't care about searchers, we do no autowarming and never query so ? 1, or 5 or what does this even mean? Current settings: useColdSearcher: true, why do I care? maxWarmingSearchers: 1 - becasue I can't see why I would ever have more than one but I'm concerned since the docs advise otherwise for high throughput masters which applies in my case. For the Slave: Again, seems we shoudl get 1 new searcher every 5 minutes but there would be 2 existing for ~50 seconds as the second one autowarms. useColdSearcher: false, wait for the warming to do the heavy lifting maxWarmingSearchers: 1 again, always use just one? Thanks for any insights into these Jim -- View this message in context: http://www.nabble.com/Solrconfig-clarification%3A-ColdSearcher-MaxWarmingSearcher-tp20904462p20904462.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index updates blocking readers: To Multicore or not?
We shread the RSS into individual items then create Solr XML documents to insert. Solr is an easy choice for us over straight Lucene since it adds the server infrastructure that we would mostly be writing ourself - caching, data types, master/slave replication. We looked at nutch too - but that was before my time. Jim John Martyniak-3 wrote: > > Thank you that is good information, as that is kind of way that I am > leaning. > > So when you fetch the content from RSS, does that get rendered to an > XML document that Solr indexes? > > Also what where a couple of decision points for using Solr as opposed > to using Nutch, or even straight Lucene? > > -John > > > > On Oct 22, 2008, at 11:22 AM, Jim Murphy wrote: > >> >> We index RSS content using our own home grown distributed spiders - >> not using >> Nutch. We use ruby processes do do the feed fetching and XML >> shreading, and >> Amazon SQS to queue up work packets to insert into our Solr cluster. >> >> Sorry can't be of more help. >> >> -- >> View this message in context: >> http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20113143.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > -- View this message in context: http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20114697.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index updates blocking readers: To Multicore or not?
We index RSS content using our own home grown distributed spiders - not using Nutch. We use ruby processes do do the feed fetching and XML shreading, and Amazon SQS to queue up work packets to insert into our Solr cluster. Sorry can't be of more help. -- View this message in context: http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20113143.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index updates blocking readers: To Multicore or not?
Thanks Yonik, I have more information... 1. We do indeed have large indexes: 40GB on disk, 30M documents - and is just a test server we have 8 of these in parallel. 2. The performance problem I was seeing followed replication, and first query on a new searcher. It turns out we didn't configure index warming queries very well so we removes the various "solr rocks" type queries to one that was better for our data - and had not improvement. The problem was that replication completed, a new searcher was created and registered but the first query qould take 10-20 seconds to complete. There after it took <200 milliseconds for similar non-cached queries. Profiler pointed us to building the FieldSortedHitQueue was taking all the time. Our warming query did not include a sort but our queries commonly do. Once we added the sort parameter our warming query started taking the 10-20 seconds prior to registering the searcher. After that the first query on the new searcher took the expected 200ms. LESSON LEARNED: warm your caches! And, if a sort is involved in your queries incorporate that sort in your warming query! Add a warming query for each kind of sort that you expect to do. Yonik Seeley wrote: > > On Mon, Oct 6, 2008 at 2:10 PM, Jim Murphy <[EMAIL PROTECTED]> wrote: >> We have a farm of several Master-Slave pairs all managing a single very >> large >> "logical" index sharded across the master-slaves. We notice on the >> slaves, >> after an rsync update, as the index is being committed that all queries >> are >> blocked sometimes resulting in unacceptable service times. I'm looking >> at >> ways we can manage these "update burps". > > Updates should never block queries. > What version of Solr are you using? > Is it possible that your indexes are so big, opening a new index in > the background causes enough of the old index to be flushed from OS > cache, causing big slowdowns? > > -Yonik > > >> Question #1: Anything obvious I can tweak in the configuration to >> mitigate >> these multi-second blocking updates? Our Indexes are 40GB, 20M documents >> each. RSync updates are every 5 minutes several hundred KB per update. >> >> Question #2: I'm considering setting up each slave with multiple Solr >> cores. >> The 2 indexes per instance would be nearly identical copies but "A" would >> be >> read from while "B" is being updated, then they would swap. I'll have to >> figure out how to rsync these 2 indexes properly but if I can get the >> commits to happen to the offline index then I suspect my queries could >> proceed unblocked. >> >> Is this the wrong tree to be barking up? Any other thoughts? >> >> Thanks in advance, >> >> Jim >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p19843098.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20112546.html Sent from the Solr - User mailing list archive at Nabble.com.
FileNotFoundException on slave after replication - script bug?
We're seeing strange behavior on one of our slave nodes after replication. When the new searcher is created we see FileNotFoundExceptions in the log and the index is strangely invalid/corrupted. We may have identified the root cause but wanted to run it by the community. We figure there is a bug in the snappuller shell script, line 181: snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"` This line determines the directory name of the latest snapshot to download to the slave from the master. Problem with this line is that it grab the temporary work directory of a snapshot in progress. Those temporary directories are prefixed with "temp" and as far as I can tell should never get pulled from the master so its easy to disambiguate. It seems that this temp directory, if it exists will be the newest one so if present it will be the one replicated: FAIL. We've tweaked the line to exclude any directories starting with "temp": snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls ${master_data_dir}|grep 'snapshot\.'|grep -v wip|grep -v temp|sort -r|head -1"` This has fixed our local issue, we can submit a patch but wanted a quick sanity check because I'm surprised its not much more commonly seen. Jim -- View this message in context: http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--tp20111313p20111313.html Sent from the Solr - User mailing list archive at Nabble.com.
Index updates blocking readers: To Multicore or not?
We have a farm of several Master-Slave pairs all managing a single very large "logical" index sharded across the master-slaves. We notice on the slaves, after an rsync update, as the index is being committed that all queries are blocked sometimes resulting in unacceptable service times. I'm looking at ways we can manage these "update burps". Question #1: Anything obvious I can tweak in the configuration to mitigate these multi-second blocking updates? Our Indexes are 40GB, 20M documents each. RSync updates are every 5 minutes several hundred KB per update. Question #2: I'm considering setting up each slave with multiple Solr cores. The 2 indexes per instance would be nearly identical copies but "A" would be read from while "B" is being updated, then they would swap. I'll have to figure out how to rsync these 2 indexes properly but if I can get the commits to happen to the offline index then I suspect my queries could proceed unblocked. Is this the wrong tree to be barking up? Any other thoughts? Thanks in advance, Jim -- View this message in context: http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p19843098.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Calculated Unique Key Field
Thanks, Shalin. Shalin Shekhar Mangar wrote: > > On Wed, Oct 1, 2008 at 12:08 AM, Jim Murphy <[EMAIL PROTECTED]> wrote: > >> >> Question1: Is this the best place to do this? > > > This sounds like a job for > http://wiki.apache.org/solr/UpdateRequestProcessor > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Calculated-Unique-Key-Field-tp19747955p19842973.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Calculated Unique Key Field
It may not be all that relevant but our Update handler extends from DirectUpdateHandler2. -- View this message in context: http://www.nabble.com/Calculated-Unique-Key-Field-tp19747955p19748032.html Sent from the Solr - User mailing list archive at Nabble.com.
Calculated Unique Key Field
My unique key field is an MD5 hash of several other fields that represent identity of documents in my index. We've been calculating this externally and setting the key value in documents but have found recurring bugs as the number and variety of inserting consumers has grown... So I wanted to move to calculating these at "add" time. We already have our own UpdateHandler, extending from DirectUpdateHandler, so I extended its addDoc method to do the hashing and field setting. Here's the implementation highlights: String postGuid = // set the value - overwrite if already present { SolrInputField postGuidField = doc.getField(POST_GUID_NAME); if (postGuidField != null) { postGuidField.setValue(postGuid, DEFAULT_BOOST); } else { doc.addField(POST_GUID_NAME, postGuid); } } { // add guid field to the lucene doc too - huh. Document lucDoc = cmd.getLuceneDocument(schema); Field aiPostGuidField = lucDoc.getField(POST_GUID_NAME); if (aiPostGuidField != null) { aiPostGuidField.setValue(postGuid); } else { SchemaField aiPostGuidSchemaField = schema.getField(POST_GUID_NAME); Field postGuidField = aiPostGuidSchemaField.createField(postGuid, DEFAULT_BOOST); lucDoc.add(postGuidField); } } Question1: Is this the best place to do this? Question2: Is there a way around adding it to both the SolrDocument and the Lucene Document? Thoughts? Best regards, Jim -- View this message in context: http://www.nabble.com/Calculated-Unique-Key-Field-tp19747955p19747955.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr for Info Retreval not so much Search...
*Excellent* so a custom QueryComponent it is. The Solr score doesn't factor in too much - our search needs are modest - just does it contain the keyword (or variants, stems etc) or not. So the query trims down from ~100M to 10-1. That way the more expensive filtering operates at the smaller set as you suggest. I need to sort by one of my date fields or the external rank. The first is easy. The second is difficult so I will have to query the external system for all matching docs - but if its on the reduced set its manageable. One Remaining Question: I'd like to include my external threshold value int he document. Any ideas? Can I stuff a float field somewhere on the docs? Thanks! Jim hossman wrote: > > > : 1. Query the index for entries matching keyword. > : 2. remove any entries that are below a threshold score from the external > : system > > what do you need to sort by? .. if it's the threshold score from your > external system, you have no way of avoiding a call out to your external > system for every matching doc ... if you want to sort by the "Solr Score" > then it should be fairly easy to write a SearchComponent that gets a > DocList and walks them in order removing anything that doesn't meet the > threshold (re-executing the query with a higher number of rows if it > exhausts the current DocList) untill you've got enough to return to your > client. > > > -Hoss > > > -- View this message in context: http://www.nabble.com/Using-Solr-for-Info-Retreval-not-so-much-Search...-tp18723102p18744997.html Sent from the Solr - User mailing list archive at Nabble.com.
Understanding Filters
I'm still trying to filter my search results with external data. I have ~100 million documents in the index. I want to use the power of lucene to knock that index down to 10-100 with keyword searching and a few other regular query terms. With that smaller subset I'd like to apply a filter based on making calls to an external system to further reduce that set to 5-20. Looking at filter queries and lucense search filters it seems that they iterate over the entire index to create a bitset of documents to be included in the query. This seems the inverse of my needs. I can't make ~100 milion external calls to filter - I want lucene to handle that heavy lifting. I'm trying to figure out the right place to hook to let paging and caching in Solr work as normal but drop out result documents based on that expensive external call. Thanks, and sorry for the repeat requests. Jim -- View this message in context: http://www.nabble.com/Understanding-Filters-tp18742220p18742220.html Sent from the Solr - User mailing list archive at Nabble.com.
Question about ValueSource and large datasets
I'm looking to incorporate an external calculation in Solr/Lucene search results. I'd like to write queries that filter and sort on the value of this "virtual field". The value of the field is actually calculated at runtime based on a remote call to an external system. My Solr queries will include termqueries to match keywords - nothing special, but I'd like to filter and order results based on the virtual field as well. I started looking at a custom Field Type + ValueSource. I add a field of this "virtual field type" to the schema, and have the custom ValueSource wired in to the field type. I used the FileFloatSource example as inspiration - seems ok - but 2 questions: 1. How do I query for my virtual field? My ValueSource never seems to be activated not matter what I query for. Here is the relevant parts of my schema - see any issues? Any hints on what the query string should be? ... 2. How can I limit the number of external calls I need to make. If I use FunctionQuery syntax then my ValueSource is used. But, a BIG but, I notice that it is queried for field values for every document in the index. My index is 100 million documents but typical result size is on the order of tens. I'd like to perform the external call on those tens not on the entire index every time. ValueSource DocValues getValues(IndexReader reader) throws IOException { final float[] arr = getCachedFloats(reader); return new DocValues() { public float floatVal(int doc) { ...called 100 million times... } ... I like this approach a lot but I'm getting the feeling that I want to hook later in the query process - after the initial query (matching kleywords) is done and the document set is reduced from 100 million to tens. Do I really want a filter query of some kind? Or some other layer of filtering? Thanks in advance, Jim -- View this message in context: http://www.nabble.com/Question-about-ValueSource-and-large-datasets-tp18733993p18733993.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr for Info Retreval not so much Search...
Thanks Walter, My requirements are this: 1. Query the index for entries matching keyword. 2. remove any entries that are below a threshold score from the external system I'm looking at building a custom field type similar to ExternalFileField that can dole out a ValueSource that calls my external system. Jim Walter Underwood wrote: > > You might be able to split the ranking into a common score and > a dynamic score. Return the results nearly the right order, then > do a minimal reordering after. If you plan to move a result by > a maximum of five positions, then you could fetch 15 results to > show 10 results. That is far, far cheaper than fetching all > results and ranking them all. > > I wrote a description of the client-side part of this last year: > > http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.htm > l > > wunder > > On 7/29/08 5:59 PM, "Jim Murphy" <[EMAIL PROTECTED]> wrote: > >> >> If figured that it would be - but the rankings are dynamically >> calculated. >> I'd like to limit the number of calculations performed for this very >> reason. >> Still not sure if this approach will be better than naivly filtering docs >> after the query has happened. >> >> Reading about ValueSource thanks... >> >> Jim >> >> >> >> Yonik Seeley wrote: >>> >>> Calling out will be an order of magnitude (or two) slower compared to >>> moving the rankings into Solr, but it is doable. See ValueSource >>> (it's used by FunctionQuery). >>> >>> -Yonik >>> >>> On Tue, Jul 29, 2008 at 8:23 PM, Jim Murphy <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> I take it I can add my own functions that would take care of calling >>>> out >>>> to >>>> my external ranking system? >>>> >>>> Looking for docs on that... >>>> >>>> Jim >>>> >>>> >>>> Yonik Seeley wrote: >>>>> >>>>> A function query might fit your needs... you could move some or all of >>>>> your external ranking system into Solr. >>>>> >>>>> -Yonik >>>>> >>>>> On Tue, Jul 29, 2008 at 7:08 PM, Jim Murphy <[EMAIL PROTECTED]> >>>>> wrote: >>>>>> >>>>>> I need to store 100 million documents in our Solr instance and be >>>>>> able >>>>>> to >>>>>> retrieve them with simple term queries - keyword matches. I'm NOT >>>>>> implementing a search application where documents are scored and >>>>>> ranked...they either match the keywords or not. Also, I have an >>>>>> external >>>>>> ranking system that I need to use to filter and order the search >>>>>> results. >>>>>> >>>>>> My requirements are for the very fast and reliable retrieval so I'm >>>>>> trying >>>>>> to figure a place to hook in or customize Solr/Lucene to just do the >>>>>> simplest thing, reliably and fast. >>>>>> >>>>>> 1. A naive approach would be to implement a handler, let the query >>>>>> happen >>>>>> normally then perform N lookups to my external scoring system then >>>>>> filter >>>>>> and sort the documents. It seems I may be doing a lot of extra work >>>>>> that >>>>>> way, especially with paging results and who knows what I'd doing to >>>>>> the >>>>>> cache. >>>>>> >>>>>> 2. Create a custom FieldType that is virtual and calls out to my >>>>>> external >>>>>> system? Then queries could be written to return all docs > my rank. >>>>>> >>>>>> 3. Implement custom Query, Weight, Scorer (et al) implementations to >>>>>> minimize the "Search Stuff" and just delegate calls to my external >>>>>> ranking >>>>>> system. >>>>>> >>>>>> 4. A filter of some kind? >>>>>> >>>>>> >>>>>> I'd love to get a sanity check on any of these approaches or some >>>>>> recommendations. >>>>>> >>>>>> Thanks >>>>>> >>>>>> Jim >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Using-Solr-for-Info-Retreval-not-so-much-Search...-tp1 >>>> 8723102p18723877.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >>>> >>> >>> > > > -- View this message in context: http://www.nabble.com/Using-Solr-for-Info-Retreval-not-so-much-Search...-tp18723102p18724853.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr for Info Retreval not so much Search...
If figured that it would be - but the rankings are dynamically calculated. I'd like to limit the number of calculations performed for this very reason. Still not sure if this approach will be better than naivly filtering docs after the query has happened. Reading about ValueSource thanks... Jim Yonik Seeley wrote: > > Calling out will be an order of magnitude (or two) slower compared to > moving the rankings into Solr, but it is doable. See ValueSource > (it's used by FunctionQuery). > > -Yonik > > On Tue, Jul 29, 2008 at 8:23 PM, Jim Murphy <[EMAIL PROTECTED]> wrote: >> >> I take it I can add my own functions that would take care of calling out >> to >> my external ranking system? >> >> Looking for docs on that... >> >> Jim >> >> >> Yonik Seeley wrote: >>> >>> A function query might fit your needs... you could move some or all of >>> your external ranking system into Solr. >>> >>> -Yonik >>> >>> On Tue, Jul 29, 2008 at 7:08 PM, Jim Murphy <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> I need to store 100 million documents in our Solr instance and be able >>>> to >>>> retrieve them with simple term queries - keyword matches. I'm NOT >>>> implementing a search application where documents are scored and >>>> ranked...they either match the keywords or not. Also, I have an >>>> external >>>> ranking system that I need to use to filter and order the search >>>> results. >>>> >>>> My requirements are for the very fast and reliable retrieval so I'm >>>> trying >>>> to figure a place to hook in or customize Solr/Lucene to just do the >>>> simplest thing, reliably and fast. >>>> >>>> 1. A naive approach would be to implement a handler, let the query >>>> happen >>>> normally then perform N lookups to my external scoring system then >>>> filter >>>> and sort the documents. It seems I may be doing a lot of extra work >>>> that >>>> way, especially with paging results and who knows what I'd doing to the >>>> cache. >>>> >>>> 2. Create a custom FieldType that is virtual and calls out to my >>>> external >>>> system? Then queries could be written to return all docs > my rank. >>>> >>>> 3. Implement custom Query, Weight, Scorer (et al) implementations to >>>> minimize the "Search Stuff" and just delegate calls to my external >>>> ranking >>>> system. >>>> >>>> 4. A filter of some kind? >>>> >>>> >>>> I'd love to get a sanity check on any of these approaches or some >>>> recommendations. >>>> >>>> Thanks >>>> >>>> Jim >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Using-Solr-for-Info-Retreval-not-so-much-Search...-tp18723102p18723877.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Using-Solr-for-Info-Retreval-not-so-much-Search...-tp18723102p18724269.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr for Info Retreval not so much Search...
I take it I can add my own functions that would take care of calling out to my external ranking system? Looking for docs on that... Jim Yonik Seeley wrote: > > A function query might fit your needs... you could move some or all of > your external ranking system into Solr. > > -Yonik > > On Tue, Jul 29, 2008 at 7:08 PM, Jim Murphy <[EMAIL PROTECTED]> wrote: >> >> I need to store 100 million documents in our Solr instance and be able to >> retrieve them with simple term queries - keyword matches. I'm NOT >> implementing a search application where documents are scored and >> ranked...they either match the keywords or not. Also, I have an external >> ranking system that I need to use to filter and order the search results. >> >> My requirements are for the very fast and reliable retrieval so I'm >> trying >> to figure a place to hook in or customize Solr/Lucene to just do the >> simplest thing, reliably and fast. >> >> 1. A naive approach would be to implement a handler, let the query happen >> normally then perform N lookups to my external scoring system then filter >> and sort the documents. It seems I may be doing a lot of extra work that >> way, especially with paging results and who knows what I'd doing to the >> cache. >> >> 2. Create a custom FieldType that is virtual and calls out to my external >> system? Then queries could be written to return all docs > my rank. >> >> 3. Implement custom Query, Weight, Scorer (et al) implementations to >> minimize the "Search Stuff" and just delegate calls to my external >> ranking >> system. >> >> 4. A filter of some kind? >> >> >> I'd love to get a sanity check on any of these approaches or some >> recommendations. >> >> Thanks >> >> Jim > > -- View this message in context: http://www.nabble.com/Using-Solr-for-Info-Retreval-not-so-much-Search...-tp18723102p18723877.html Sent from the Solr - User mailing list archive at Nabble.com.
Using Solr for Info Retreval not so much Search...
I need to store 100 million documents in our Solr instance and be able to retrieve them with simple term queries - keyword matches. I'm NOT implementing a search application where documents are scored and ranked...they either match the keywords or not. Also, I have an external ranking system that I need to use to filter and order the search results. My requirements are for the very fast and reliable retrieval so I'm trying to figure a place to hook in or customize Solr/Lucene to just do the simplest thing, reliably and fast. 1. A naive approach would be to implement a handler, let the query happen normally then perform N lookups to my external scoring system then filter and sort the documents. It seems I may be doing a lot of extra work that way, especially with paging results and who knows what I'd doing to the cache. 2. Create a custom FieldType that is virtual and calls out to my external system? Then queries could be written to return all docs > my rank. 3. Implement custom Query, Weight, Scorer (et al) implementations to minimize the "Search Stuff" and just delegate calls to my external ranking system. 4. A filter of some kind? I'd love to get a sanity check on any of these approaches or some recommendations. Thanks Jim -- View this message in context: http://www.nabble.com/Using-Solr-for-Info-Retreval-not-so-much-Search...-tp18723102p18723102.html Sent from the Solr - User mailing list archive at Nabble.com.