RE: Using the highlighter from the sandbox with a prefix query.
Thank you this helped a lot... Michael Celona -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, February 21, 2005 11:55 AM To: Lucene Users List Subject: Re: Using the highlighter from the sandbox with a prefix query. On Feb 21, 2005, at 10:53 AM, Michael Celona wrote: > That the only stack I get. One thing to mention that I am using a > MultiSearcher to rewrite the queries. I tried... > > query = searcher_last.rewrite( query ); > query = searcher_cur.rewrite( query ); > > using IndexSearcher and I don't get an error... However, I not able to > highlight wildcard queries. I use Highlighter for lucenebook.com and have two indexes that I search with MultiSearcher. Here's how I highlight: IndexReader reader = readers[indexIndex]; QueryScorer scorer = new QueryScorer(query.rewrite(reader)); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("", ""); Highlighter highlighter = new Highlighter(formatter, scorer); I get the appropriate IndexReader for the document being highlighted. You can get the index _index_ this way: ' int indexIndex = searcher.subSearcher(hits.id(position)); Hope this helps. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using the highlighter from the sandbox with a prefix query.
That the only stack I get. One thing to mention that I am using a MultiSearcher to rewrite the queries. I tried... query = searcher_last.rewrite( query ); query = searcher_cur.rewrite( query ); using IndexSearcher and I don't get an error... However, I not able to highlight wildcard queries. Michael -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, February 21, 2005 10:32 AM To: Lucene Users List Subject: Re: Using the highlighter from the sandbox with a prefix query. On Feb 21, 2005, at 10:20 AM, Michael Celona wrote: > I am using > query = searcher.rewrite( query ); > > and it is throwing java.lang.UnsupportedOperationException . > > Am I able to use the searcher rewrite method like this? What's the full stack trace? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using the highlighter from the sandbox with a prefix query.
I am using query = searcher.rewrite( query ); and it is throwing java.lang.UnsupportedOperationException . Am I able to use the searcher rewrite method like this? Thanks, Michael -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Thursday, February 17, 2005 4:09 AM To: Lucene Users List Subject: Re: Using the highlighter from the sandbox with a prefix query. On Thursday 17 February 2005 08:37, lucuser4851 wrote: > We have been using the highlighter from the lucene sandbox, which works > very nicely most of the time. However when we try and use it with a > prefix query (which is what you get having parsed a wild-card query), it > doesn't return any highlighted sections. Has anyone else experienced > this problem, or found a way around it? You need to call rewrite() on the query before you pass it to the highlighter. Regards Daniel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Performance
Just tried that... works like a charm... thanks... Michael -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, February 18, 2005 4:42 PM To: Lucene Users List; Chris Lamprecht Subject: Re: Search Performance Or you could just open a new IndexSearcher, forget the old one, and have GC collect it when everyone is done with it. Otis --- Chris Lamprecht <[EMAIL PROTECTED]> wrote: > I should have mentioned, the reason for not doing this the obvious, > simple way (just close the Searcher and reopen it if a new version is > available) is because some threads could be in the middle of > iterating > through the search Hits. If you close the Searcher they get a Bad > file descriptor IOException. As I found out the hard way :) > > > On Fri, 18 Feb 2005 15:03:29 -0600, Chris Lamprecht > <[EMAIL PROTECTED]> wrote: > > I recently dealt with the issue of re-using a Searcher with an > index > > that changes often. I wrote a class that allows my searching > classes > > to "check out" a lucene Searcher, perform a search, and then return > > the Searcher. It's similar to a database connection pool, except > that > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Performance
Thanks... I am seeing this problem right now Has anyone implemented a better solution...? Michael -Original Message- From: Chris Lamprecht [mailto:[EMAIL PROTECTED] Sent: Friday, February 18, 2005 4:14 PM To: Lucene Users List Subject: Re: Search Performance I should have mentioned, the reason for not doing this the obvious, simple way (just close the Searcher and reopen it if a new version is available) is because some threads could be in the middle of iterating through the search Hits. If you close the Searcher they get a Bad file descriptor IOException. As I found out the hard way :) On Fri, 18 Feb 2005 15:03:29 -0600, Chris Lamprecht <[EMAIL PROTECTED]> wrote: > I recently dealt with the issue of re-using a Searcher with an index > that changes often. I wrote a class that allows my searching classes > to "check out" a lucene Searcher, perform a search, and then return > the Searcher. It's similar to a database connection pool, except that - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Performance
I am using the highlighter... does this matter -Original Message- From: David Spencer [mailto:[EMAIL PROTECTED] Sent: Friday, February 18, 2005 2:05 PM To: Lucene Users List Subject: Re: Search Performance Are you using the highlighter or doing anything non-trivial in displaying the results? Are the pages being compressed (mod_gzip or some servlet equivalent)? This definitely helps, though to see the effect you may have to make sure your simulated users are "remote". Also consider caching search results if it's reasonable to assume users may search for the same things. I made some measurements on caching on my site: http://www.searchmorph.com/weblog/index.php?id=41 http://www.searchmorph.com/weblog/index.php?id=40 And I use OSCache: http://www.searchmorph.com/weblog/index.php?id=38 http://www.opensymphony.com/oscache/ Michael Celona wrote: > What is single handedly the best way to improve search performance? I have > an index in the 2G range stored on the local file system of the searcher. > Under a load test of 5 simultaneous users my average search time is ~4700 > ms. Under a load test of 10 simultaneous users my average search time is > ~1 ms.I have given the JVM 2G of memory and am a using a dual 3GHz > Zeons. Any ideas? > > > > Michael > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Performance
My index is changing in real time constantly... in this case I guess this will not work for me any suggestions... Michael -Original Message- From: David Townsend [mailto:[EMAIL PROTECTED] Sent: Friday, February 18, 2005 11:50 AM To: Lucene Users List Subject: RE: Search Performance IndexSearchers are thread safe, so you can use the same object on multiple requests. If the index is static and not constantly updating, just keep one IndexSearcher for the life of the app. If the index changes and you need that instantly reflected in the results, you need to check if the index has changed, if it has create a new cached IndexSearcher. To check for changes use you'll need to monitor the version number of the index obtained via IndexReader.getCurrentVersion(Index Name) David -Original Message- From: Stefan Groschupf [mailto:[EMAIL PROTECTED] Sent: 18 February 2005 16:15 To: Lucene Users List Subject: Re: Search Performance Try a singleton pattern or an static field. Stefan Michael Celona wrote: >I am creating new IndexSearchers... how do I cache my IndexSearcher... > >Michael > >-Original Message- >From: David Townsend [mailto:[EMAIL PROTECTED] >Sent: Friday, February 18, 2005 11:00 AM >To: Lucene Users List >Subject: RE: Search Performance > >Are you creating new IndexSearchers or IndexReaders on each search? Caching >your IndexSearchers has a dramatic effect on speed. > >David Townsend > >-Original Message- >From: Michael Celona [mailto:[EMAIL PROTECTED] >Sent: 18 February 2005 15:55 >To: Lucene Users List >Subject: Search Performance > > >What is single handedly the best way to improve search performance? I have >an index in the 2G range stored on the local file system of the searcher. >Under a load test of 5 simultaneous users my average search time is ~4700 >ms. Under a load test of 10 simultaneous users my average search time is >~1 ms.I have given the JVM 2G of memory and am a using a dual 3GHz >Zeons. Any ideas? > > > >Michael > > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Performance
I am creating new IndexSearchers... how do I cache my IndexSearcher... Michael -Original Message- From: David Townsend [mailto:[EMAIL PROTECTED] Sent: Friday, February 18, 2005 11:00 AM To: Lucene Users List Subject: RE: Search Performance Are you creating new IndexSearchers or IndexReaders on each search? Caching your IndexSearchers has a dramatic effect on speed. David Townsend -Original Message- From: Michael Celona [mailto:[EMAIL PROTECTED] Sent: 18 February 2005 15:55 To: Lucene Users List Subject: Search Performance What is single handedly the best way to improve search performance? I have an index in the 2G range stored on the local file system of the searcher. Under a load test of 5 simultaneous users my average search time is ~4700 ms. Under a load test of 10 simultaneous users my average search time is ~1 ms.I have given the JVM 2G of memory and am a using a dual 3GHz Zeons. Any ideas? Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search Performance
What is single handedly the best way to improve search performance? I have an index in the 2G range stored on the local file system of the searcher. Under a load test of 5 simultaneous users my average search time is ~4700 ms. Under a load test of 10 simultaneous users my average search time is ~1 ms.I have given the JVM 2G of memory and am a using a dual 3GHz Zeons. Any ideas? Michael
java.io.IOException: Stale NFS file handle
Has anyone seen this.. java.io.IOException: Stale NFS file handle at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:307) at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:420) at org.apache.lucene.store.InputStream.refill(InputStream.java:158) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) at org.apache.lucene.store.InputStream.readBytes(InputStream.java:57) at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou ndFileReader.java:220) at org.apache.lucene.store.InputStream.refill(InputStream.java:158) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:102) at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:361) at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:366) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:2 68) at org.apache.lucene.search.FieldCacheImpl.getAuto(FieldCacheImpl.java:343) at org.apache.lucene.search.FieldSortedHitQueue.comparatorAuto(FieldSortedHitQu eue.java:327) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSorted HitQueue.java:170) at org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java :58) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:122) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:141) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) at org.apache.lucene.search.Hits.(Hits.java:51) at org.apache.lucene.search.Searcher.search(Searcher.java:49) I get this during a load test or 5 simultaneous users. I have the index NFS mounted from an "indexer box" which holds the index to an application server (tomcat). My index is constantly being added to. Search performance is in the 4 second range ( queryString of "the" ) on an index of about 2G (as of now). does anyone know how I can speed this up. Any insight would be greatly appreciated. Michael
RE: Similarity coord,lengthNorm
Would fixing the lengthNorm to 1 fix this problem? Michael -Original Message- From: Michael Celona [mailto:[EMAIL PROTECTED] Sent: Monday, February 07, 2005 8:48 AM To: Lucene Users List Subject: Similarity coord,lengthNorm I have varying length text fields which I am searching on. I would like relevancy to be dictated predominantly by the number of terms in my query that match. Right now I am seeing a high relevancy for a single word matching in a small document even though all the terms in my query don't match. Does, anyone have an example of a custom Similarity sub class which overrides the coord and lengthNorm methods. Thanks.. Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Similarity coord,lengthNorm
I have varying length text fields which I am searching on. I would like relevancy to be dictated predominantly by the number of terms in my query that match. Right now I am seeing a high relevancy for a single word matching in a small document even though all the terms in my query don't match. Does, anyone have an example of a custom Similarity sub class which overrides the coord and lengthNorm methods. Thanks.. Michael
text highlighting
Does any have a working example of the highlighter class found in the sandbox? -Original Message- From: Jason Polites [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 26, 2005 5:34 PM To: Lucene Users List Subject: Re: Search Engine review article/book Also: http://labs.google.com/papers.html http://research.microsoft.com/wsm/ - Original Message - From: "Stefan Groschupf" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, January 27, 2005 9:27 AM Subject: Re: Search Engine review article/book >+ the lucene in action book. :-) > + scholar.google.com > + acm.org ir group > + ieee.org has ir group as well > may you will find http://searchenginewatch.com/ useful as well. > > HTH > Stefan > > > Am 26.01.2005 um 23:18 schrieb Xiaohong Yang ((Sharon)): > >> Hi all, >> >> I am looking for good review articles or books regarding latest search >> engine development trend and practices. Any suggestions would be very >> helpful. Any comments not covered by articles are also welcome. >> >> Thanks a lot, >> >> Sharon >> > --- > company: http://www.media-style.com > forum: http://www.text-mining.org > blog: http://www.find23.net > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]