RemoteSearcher
Hello. Does anyone know application which based on RemoteSearcher to distribute index on many servers? Yura Smolsky, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
> On Jan 5, 2005, at 3:46 PM, Bill Janssen wrote: > > Maybe I just misunderstand your release numbering policy. Typically, > > in a library project that has major, minor, and micro release numbers, > > I'd expect no API changes between micro releases of a single minor > > release; only backward-compatible API extensions between different > > minor releases of a single major release; possible wholesale API > > changes (not backward compatible) between different major releases. > > Is this the kind of thinking that you also have? > > Yes, absolutely. The flaw you have stumbled on was completely an > oversight and a mistake that should not have occurred. I, for one, > apologize for not catching it. Only because I have custom QueryParser > subclasses and lots of unit tests did I catch the signature changes > that I did, and I'm not sure how I missed this one. I have not gone > back, yet, to review the change history and whether my code is broken > in one of those versions of Lucene, or whether I've not overridden that > method. OK, then it's just a bug, and we all make bugs (me probably more than you, at that). Thanks for all your help with this, Erik. Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Span Query Performance
Hi all, I'm currently doing a query similar to the following: for w in wordset: query = w near (word1 V word2 V word3 ... V word1422); perform query and I am doing this through SpanQuery.getSpans(), iterating through the spans and counting the matches, which can result in 4782282 matches (essentially I am only after the match count). The query works but the performance can be somewhat slow; so I am wondering: a) Would the query potentially run faster if I used Searcher.search(query) with a custom similarity, or do both methods essentially use the same mechanics b) Does using a RAMDirectory improve query performance any significant amount. c) Is there a faster method to what I am doing I should consider? Thanks, Andrew - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing flat files with out .txt extension
On Jan 5, 2005, at 6:31 PM, Hetan Shah wrote: How can one index simple text files with out the .txt extension. I am trying to use the IndexFiles and IndexHTML but not to my satisfaction. In the IndexFiles I do not get any control over the content of the file and in case of IndexHTML the files with out any extension do not get index all together. Any pointers are really appreciated. Try out the Indexer code from Lucene in Action. You can download it from the link here: http://www.lucenebook.com/blog/announcements/sourcecode.html It'll be cleaner to follow and borrow from. The code that ships with Lucene is for demonstration purposes. It surprises me how often folks use that code to build real indexes. It's quite straightforward to create your own Java code to do the indexing in whatever manner you like, borrowing from examples. When you get the download unpacked, simply run "ant Indexer" to see it in action. And then "ant Searcher" to search the index just built. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Indexing flat files with out .txt extension
Hello, How can one index simple text files with out the .txt extension. I am trying to use the IndexFiles and IndexHTML but not to my satisfaction. In the IndexFiles I do not get any control over the content of the file and in case of IndexHTML the files with out any extension do not get index all together. Any pointers are really appreciated. Thanks. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
Hello Bill, "I feel your pain" ;) But seriously, there was a QueryParser mess-up in the recent minor releases. I think this is the first time we've messed up the backward compatibility in the last ~4 years, I believe. Lucene public API is very 'narrow', and typically very stable. What we did with QueryParser was the result of 'overeagerness', but is really out of character for Lucene. Otis --- Bill Janssen <[EMAIL PROTECTED]> wrote: > Doug, > > My application (see http://www.parc.com/janssen/pubs/TR-03-16.pdf for > details) is not just a Java app (you're probably not surprised :-). > It requires about a dozen other packages to be installed on a > machine, > before building from source. The Python Imaging Library, ReportLab, > libtiff, libpng, xpdf, htmldoc, etc. Lucene is one of these > prerequisites. I don't include any other outside code with my tar > file; not sure why Lucene should be the only one to require this. > > Besides, I'd like to keep up with the continuous improvements in > Lucene. I don't want to be stuck with 1.4.1 forever. > > Please understand that I'm not trying to push your project in any > particular direction. I'm just trying to understand whether Lucene > is > usable for my project. If every micro-release of Lucene means that I > will potentially have to re-write my code, I may have to look for a > library with a more stable API. > > Maybe I just misunderstand your release numbering policy. Typically, > in a library project that has major, minor, and micro release > numbers, > I'd expect no API changes between micro releases of a single minor > release; only backward-compatible API extensions between different > minor releases of a single major release; possible wholesale API > changes (not backward compatible) between different major releases. > Is this the kind of thinking that you also have? > > I can certainly understand that when you find improvements you'd like > to make in the API, you'd want to put them in. I just think it's > important not to break existing code without bumping the release > number, so that a user can say, "This works with Lucene 1.4". Right > now, that can't be said. > > Bill > > Doug Cutting wrote: > > Bill, most folks bundle appropriate versions of required jars with > their > > applications to avoid this sort of problem. How are you deploying > > things? Are you not bundling a compatible version of the lucene > jar > > with each release of your application? If not, why not? > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
On Jan 5, 2005, at 3:46 PM, Bill Janssen wrote: Maybe I just misunderstand your release numbering policy. Typically, in a library project that has major, minor, and micro release numbers, I'd expect no API changes between micro releases of a single minor release; only backward-compatible API extensions between different minor releases of a single major release; possible wholesale API changes (not backward compatible) between different major releases. Is this the kind of thinking that you also have? Yes, absolutely. The flaw you have stumbled on was completely an oversight and a mistake that should not have occurred. I, for one, apologize for not catching it. Only because I have custom QueryParser subclasses and lots of unit tests did I catch the signature changes that I did, and I'm not sure how I missed this one. I have not gone back, yet, to review the change history and whether my code is broken in one of those versions of Lucene, or whether I've not overridden that method. In short - we screwed up, and we should fix it since its obviously important to you. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
On Jan 5, 2005, at 3:48 PM, Bill Janssen wrote: In 1.4.1 or 1.4.3? Both - my suggestion was an attempt to get you something that would work in both versions. Erik On Jan 4, 2005, at 9:43 PM, Bill Janssen wrote: Let me be a bit more explicit. My method (essentially an after-method, for those Lisp'rs out there) begins thusly: protected Query getFieldQuery (String field, Analyzer a, String queryText) throws ParseException { Query x = super.getFieldQuery(field, a, queryText); ... } If I remove the "Analyzer a" from both the signature and the super call, the super call won't compile because that method isn't in the QueryParser in 1.4.1. But my getFieldQuery() method won't even be called in 1.4.1, because it doesn't exist in that version of the QueryParser. Will it work if you override this method also? protected Query getFieldQuery(String field, Analyzer analyzer, String queryText, int slop) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
In 1.4.1 or 1.4.3? > On Jan 4, 2005, at 9:43 PM, Bill Janssen wrote: > > Let me be a bit more explicit. My method (essentially an > > after-method, for those Lisp'rs out there) begins thusly: > > > > protected Query getFieldQuery (String field, > >Analyzer a, > >String queryText) > > throws ParseException { > > > > Query x = super.getFieldQuery(field, a, queryText); > > > > ... > > } > > > > If I remove the "Analyzer a" from both the signature and the super > > call, the super call won't compile because that method isn't in the > > QueryParser in 1.4.1. But my getFieldQuery() method won't even be > > called in 1.4.1, because it doesn't exist in that version of the > > QueryParser. > > Will it work if you override this method also? > > protected Query getFieldQuery(String field, >Analyzer analyzer, >String queryText, >int slop) > > Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
Doug, My application (see http://www.parc.com/janssen/pubs/TR-03-16.pdf for details) is not just a Java app (you're probably not surprised :-). It requires about a dozen other packages to be installed on a machine, before building from source. The Python Imaging Library, ReportLab, libtiff, libpng, xpdf, htmldoc, etc. Lucene is one of these prerequisites. I don't include any other outside code with my tar file; not sure why Lucene should be the only one to require this. Besides, I'd like to keep up with the continuous improvements in Lucene. I don't want to be stuck with 1.4.1 forever. Please understand that I'm not trying to push your project in any particular direction. I'm just trying to understand whether Lucene is usable for my project. If every micro-release of Lucene means that I will potentially have to re-write my code, I may have to look for a library with a more stable API. Maybe I just misunderstand your release numbering policy. Typically, in a library project that has major, minor, and micro release numbers, I'd expect no API changes between micro releases of a single minor release; only backward-compatible API extensions between different minor releases of a single major release; possible wholesale API changes (not backward compatible) between different major releases. Is this the kind of thinking that you also have? I can certainly understand that when you find improvements you'd like to make in the API, you'd want to put them in. I just think it's important not to break existing code without bumping the release number, so that a user can say, "This works with Lucene 1.4". Right now, that can't be said. Bill Doug Cutting wrote: > Bill, most folks bundle appropriate versions of required jars with their > applications to avoid this sort of problem. How are you deploying > things? Are you not bundling a compatible version of the lucene jar > with each release of your application? If not, why not? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
multi-threaded thru-put in lucene
Hi folks: We are trying to measure thru-put lucene in a multi-threaded environment. This is what we found: 1 thread, search takes 20 ms. 2 threads, search takes 40 ms. 5 threads, search takes 100 ms. Seems like under a multi-threaded scenario, thru-put isn't good, performance is not any better than that of 1 thread. I tried to share an IndexSearcher amongst all threads as well as having an IndexSearcher per thread. Both yield same numbers. Is this consistent with what you'd expect? Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Question about Analyzer and words spelled in different languages
Hi ALL, We are trying to index scientic articles written in english, but whose authors can be spelled in any language (depending on the author's nazionality) E.g. Schäffer In the XML document that we provide to Lucene the author name is written in the following way (using HTML ENTITIES) Schäffer So in practice that is the name that would be given to a Lucene analyzer/filter Is there any already written analyzer that would take that name (Schäffer or any other name that has entities) so that Lucene index could searched (once the field has been indexed) for the real version of the name, which is Schäffer and the english spelled version of the name which is Schaffer Thanks a lot in advance for your help, Mariella
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
Bill Janssen wrote: Sure, if I wanted to ship different code for each micro-release of Lucene (which, you might guess, I don't). That signature doesn't compile with 1.4.1. Bill, most folks bundle appropriate versions of required jars with their applications to avoid this sort of problem. How are you deploying things? Are you not bundling a compatible version of the lucene jar with each release of your application? If not, why not? I'm not trying to be difficult, just trying to understand. Thanks, Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: simultaneous index/search/delete
Any index-modifying operations need to be serializes. Searching is read-only and can be done in parallel with anything else. See http://www.lucenebook.com/search?query=concurrent for some hints. Otis --- Alex Kiselevski <[EMAIL PROTECTED]> wrote: > > Concerning the question about simultaneous index/search/delete : > Do i have to put synchronized on methods that call to API functions > of > index/search/delete > > > The information contained in this message is proprietary of Amdocs, > protected from disclosure, and may be privileged. > The information is intended to be conveyed only to the designated > recipient(s) > of the message. If the reader of this message is not the intended > recipient, > you are hereby notified that any dissemination, use, distribution or > copying of > this communication is strictly prohibited and may be unlawful. > If you have received this communication in error, please notify us > immediately > by replying to the message and deleting it from your computer. > Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: PDFBox deprecated methods
Daniel, Yes, that getText( PDDocument ) is the method you should be using. You no longer need to use a COSDocument object, please note the following methods that go along with the deprecation of getText( COSDocument ) PDFParser.getPDDocument() - to get a PDDocument instead of a COSDocument after parsing PDDocument.load() - A convenience method that does all the PDFParser stuff and returns a PDDocument LucenePDFDocument.getDocument() - to go straight from a File/URL to a lucene document object Ben Quoting Daniel Cortes <[EMAIL PROTECTED]>: > Ok I reply myself > the method deprecated is .getText(Cos Document)) > if you do stripper.getText(new PDDocument(cosDoc)) there isn't any problem. > > > Excuse me, for the question > > > Daniel Cortes wrote: > > > I've been use PDFBox in my indexation of a directory . I've download > > the last version of PDFBox (0.6.7.a) and I've seen that the method > > that I use to extract > > was a deprecated method. PDFTextStripper.getText(). > > stripper.getText(new PDDocument(cosDoc)); > > I know a lot of person use same me this method. What are alternative > > options ? > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - This mail sent through IMP: http://horde.org/imp/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searching while indexing.
On Wednesday 05 January 2005 12:14, Morus Walter wrote: > Peter Veentjer - Anchor Men writes: > > >>Is your IndexReader doing deletes? > > Yes.. I have to remove the documents I`m going to update from the > > Reader. > > > > >>That is the only time it locks the index (because that is essentially > > >>a write operation). If you're purely searching with the reader it > > >>should work fine with a writer concurrently. > > > > Ok, I understand why there are problems. But how can I fix this problem? > > I have to update documents, so how can I do this without deleting > > documents from the Reader? I don`t want to add the same document twice. > > > You have to bundle all writes at one point and serialize deletions and > imports. > That is: > open a reader for deleting > delete the documents to be deleted > close that reader > open a writer for adding content > add documents > close that writer > begin at start. > > It's up to you, whether you open a reader to delete single documents and > a writer for adding a single document or use batches of several documents, > but you cannot escape the need to serialize the writes. And while this updating is going on, you can keep another reader open for searching, it will not be affected by the updates. After all updates are done, close that reader and reopen another one to see the updates. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: PDFBox deprecated methods
Ok I reply myself the method deprecated is .getText(Cos Document)) if you do stripper.getText(new PDDocument(cosDoc)) there isn't any problem. Excuse me, for the question Daniel Cortes wrote: I've been use PDFBox in my indexation of a directory . I've download the last version of PDFBox (0.6.7.a) and I've seen that the method that I use to extract was a deprecated method. PDFTextStripper.getText(). stripper.getText(new PDDocument(cosDoc)); I know a lot of person use same me this method. What are alternative options ? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
PDFBox deprecated methods
I've been use PDFBox in my indexation of a directory . I've download the last version of PDFBox (0.6.7.a) and I've seen that the method that I use to extract was a deprecated method. PDFTextStripper.getText(). stripper.getText(new PDDocument(cosDoc)); I know a lot of person use same me this method. What are alternative options ? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
simultaneous index/search/delete
Concerning the question about simultaneous index/search/delete : Do i have to put synchronized on methods that call to API functions of index/search/delete The information contained in this message is proprietary of Amdocs, protected from disclosure, and may be privileged. The information is intended to be conveyed only to the designated recipient(s) of the message. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, use, distribution or copying of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you.
RE: searching while indexing.
Peter Veentjer - Anchor Men writes: > >>Is your IndexReader doing deletes? > Yes.. I have to remove the documents I`m going to update from the > Reader. > > >>That is the only time it locks the index (because that is essentially > >>a write operation). If you're purely searching with the reader it > >>should work fine with a writer concurrently. > > Ok, I understand why there are problems. But how can I fix this problem? > I have to update documents, so how can I do this without deleting > documents from the Reader? I don`t want to add the same document twice. > You have to bundle all writes at one point and serialize deletions and imports. That is: open a reader for deleting delete the documents to be deleted close that reader open a writer for adding content add documents close that writer begin at start. It's up to you, whether you open a reader to delete single documents and a writer for adding a single document or use batches of several documents, but you cannot escape the need to serialize the writes. HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Advice on indexing content from a database
Hibernate + Lucene, Use Hibernate to read from your DB, this will pull out the data you need in nice and clean objects, and then loop through your object collection and create Lucene documents, you can add Quartz to the equation and have this process run scheduled on chunks of your data till it's all been indexed and then continue on with incremental updates / deletes. Nader Henein [EMAIL PROTECTED] wrote: Hi I'm working on integrating lucene with a cms. All the data is stored in a database. I'm looking at about 2 million records. Any advice on an effective technique to index this (incrementally or using threads) that would not overload my server. Thanks Aneesha - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Advice on indexing content from a database
Hi I'm working on integrating lucene with a cms. All the data is stored in a database. I'm looking at about 2 million records. Any advice on an effective technique to index this (incrementally or using threads) that would not overload my server. Thanks Aneesha - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: searching while indexing.
>>Is your IndexReader doing deletes? Yes.. I have to remove the documents I`m going to update from the Reader. >>That is the only time it locks the index (because that is essentially >>a write operation). If you're purely searching with the reader it >>should work fine with a writer concurrently. Ok, I understand why there are problems. But how can I fix this problem? I have to update documents, so how can I do this without deleting documents from the Reader? I don`t want to add the same document twice. This is a problem many users of Lucene will face.. Could you please add a good explanation to the FAQ? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searching while indexing.
On Jan 5, 2005, at 5:12 AM, Peter Veentjer - Anchor Men wrote: -Oorspronkelijk bericht- Van: Erik Hatcher [mailto:[EMAIL PROTECTED] Verzonden: woensdag 5 januari 2005 10:58 Aan: Lucene Users List Onderwerp: Re: searching while indexing. There are no problems searching while indexing. How are you experiencing otherwise? What error do you get? I have experienced (lock) problems if I use a Reader and Writer (on the same index-directory) at the same time. My application is multithreaded (a pool of worker threads for the webrequests) and a scheduledworker thread for signaling changes (new (normal) files, changed files and removed files) and updating the index. And I`m not the only one experiencing this problem... It ( Reader and Writer open at the same time) has been mentioned on mailinglist quite a few times. Is your IndexReader doing deletes? That is the only time it locks the index (because that is essentially a write operation). If you're purely searching with the reader it should work fine with a writer concurrently. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Réf. : Re: do a simple search
[EMAIL PROTECTED] writes: > I must change the request to made search like this > > type=value AND (shortDesc=value OR longDesc=value) > > but I don't know how to do this ? > create a boolean query for (shortDesc=value OR longDesc=value) (as you do so far) and create another boolean query adding that boolean query and the query for type:product. For the latter use add(, true, false) to make both subqueries required. HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Réf. : Re: Réf. : Re: do a simple search
OK, thanks On Wed, 2005-01-05 at 10:56 +0100, [EMAIL PROTECTED] wrote: > I have alway another field "type" who is the type of the searched > document. > I must change the request to made search like this > > type=value AND (shortDesc=value OR longDesc=value) > > but I don't know how to do this ? > > here is the query with fields values > > Field name: type > Field value: product > Field name: shortDesc > Field value: toto > Field name: longDesc > Field value: toto > IndexManager query = type:product shortDesc:toto longDesc:toto For the type field I suggest using a TermQuery. Is the document type from a list of defined types? i.e. is it stored as a keyword and hence doesn't need parsing? For the other fields I recommend trying out the DistributingMultiFieldQueryParser class, which isn't in the main distro yet but can be found here: http://issues.apache.org/bugzilla/show_bug.cgi?id=32674 It handles all the awkward bits of making sure all fields are searched correctly. Then combine the two query objects in a BooleanQuery. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Réf. : Re: do a simple search
On Wed, 2005-01-05 at 10:56 +0100, [EMAIL PROTECTED] wrote: > I have alway another field "type" who is the type of the searched > document. > I must change the request to made search like this > > type=value AND (shortDesc=value OR longDesc=value) > > but I don't know how to do this ? > > here is the query with fields values > > Field name: type > Field value: product > Field name: shortDesc > Field value: toto > Field name: longDesc > Field value: toto > IndexManager query = type:product shortDesc:toto longDesc:toto For the type field I suggest using a TermQuery. Is the document type from a list of defined types? i.e. is it stored as a keyword and hence doesn't need parsing? For the other fields I recommend trying out the DistributingMultiFieldQueryParser class, which isn't in the main distro yet but can be found here: http://issues.apache.org/bugzilla/show_bug.cgi?id=32674 It handles all the awkward bits of making sure all fields are searched correctly. Then combine the two query objects in a BooleanQuery. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: searching while indexing.
-Oorspronkelijk bericht- Van: Erik Hatcher [mailto:[EMAIL PROTECTED] Verzonden: woensdag 5 januari 2005 10:58 Aan: Lucene Users List Onderwerp: Re: searching while indexing. There are no problems searching while indexing. How are you experiencing otherwise? What error do you get? I have experienced (lock) problems if I use a Reader and Writer (on the same index-directory) at the same time. My application is multithreaded (a pool of worker threads for the webrequests) and a scheduledworker thread for signaling changes (new (normal) files, changed files and removed files) and updating the index. And I`m not the only one experiencing this problem... It ( Reader and Writer open at the same time) has been mentioned on mailinglist quite a few times. A possible solution I have seen is creating a shadow-index and switching the reader to that index if the writer is finished. But I don`t understand while a Reader and Writer can not be opened on the same directory... On the same time. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 1.4.3 breaks 1.4.1 QueryParser functionality
On Jan 4, 2005, at 9:43 PM, Bill Janssen wrote: Let me be a bit more explicit. My method (essentially an after-method, for those Lisp'rs out there) begins thusly: protected Query getFieldQuery (String field, Analyzer a, String queryText) throws ParseException { Query x = super.getFieldQuery(field, a, queryText); ... } If I remove the "Analyzer a" from both the signature and the super call, the super call won't compile because that method isn't in the QueryParser in 1.4.1. But my getFieldQuery() method won't even be called in 1.4.1, because it doesn't exist in that version of the QueryParser. Will it work if you override this method also? protected Query getFieldQuery(String field, Analyzer analyzer, String queryText, int slop) My head is spinning looking at all the various signatures of this method we have and trying to backtrack where things went awry. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searching while indexing.
There are no problems searching while indexing. How are you experiencing otherwise? What error do you get? Erik On Jan 5, 2005, at 4:47 AM, Peter Veentjer - Anchor Men wrote: What is the best way to implement: searching while indexing. I have read the mailinglist for a while but haven`t got a good answer to my question. It is not allowed to index, while searching. But I don`t understand why. All the segments are immutable, so after I have created a Reader it could use all the segments that are available at the moment. The reader maintains references to those segments, and if the reader is not needed anymore (or the writer says: I`m finished creating new indices... you should can search through a newer set of segments) the reader could delete all the old segments. The writer can create new segments based on the immutable-old ones and based on the new documents. After it has created a new set, it can signal the reader to use the newer segments. So why is the above scenario not possible? Why are segments immutable? And what is the best way to add documents to a (big index >20 gig) without copying the index, and without blocking the search? Met vriendelijke groet, Peter Veentjer Anchor Men Interactive Solutions - duidelijk in zakelijke internetoplossingen Praediniussingel 41 9711 AE Groningen T: 050-3115222 F: 050-5891696 E: [EMAIL PROTECTED] I : www.anchormen.nl http://www.anchormen.nl/> - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Réf. : Re: do a simple search
>On Jan 5, 2005, at 3:41 AM, [EMAIL PROTECTED] wrote: >> I would like to search a word in differents fields of a document with >> an >> OR operator. >> >> My fields are "id", "shortDesc" and "longDesc". >> In java I want to search a word simultanly in "shortDesc" and >> "longDesc" >> field. >> >> for example: >> >> doc1: id:1 >> shortDesc: a foo desc >> longDesc: a doc long desc >> >> doc2: id:2 >> shortDesc:a doc short desc >> longDesc:a foo long desc >> >> doc3: id:3 >> shortDesc:another short desc >> longDesc:another long desc >> >> if the search word is "foo" i want to retreive doc1 and doc3. >You meant doc1 and doc2. yes (sorry) >> in my program, fields are stored in fieldName list. >> associated values are stored in fieldValue. >What's the question? The code you show below, at first glance, looks >reasonable, or at least close. What is the value of query.toString() >Erik Sorry but I found the pb. I have alway another field "type" who is the type of the searched document. I must change the request to made search like this type=value AND (shortDesc=value OR longDesc=value) but I don't know how to do this ? here is the query with fields values Field name: type Field value: product Field name: shortDesc Field value: toto Field name: longDesc Field value: toto IndexManager query = type:product shortDesc:toto longDesc:toto > private static Hits search(List fieldName, List fieldValue) { > Hits hits = null; > > int fieldNameSize = fieldName.size(); > int fieldValueSize = fieldValue.size(); > if (fieldNameSize != fieldValueSize) { > return null; > } > > IndexSearcher searcher = getSearcher(); > if (searcher != null) { > BooleanQuery query = new BooleanQuery(); > //populate the query with all terms > for (int i=0; i String currentFieldName = (String) > fieldName.get(i); > String currentFieldValue = (String) > fieldValue.get(i); > > StringTokenizer tokenizer = new > StringTokenizer(currentFieldValue); > while (tokenizer.hasMoreTokens()) { > String currentToken = > tokenizer.nextToken(); > Term currentTerm = new > Term(currentFieldName,currentToken); > TermQuery termQuery = new > TermQuery(currentTerm); > > > query.add(termQuery,false,false); > } > } > > //do the search > try { > //System.out.println("IndexManager > query = > " + query.toString()); > hits = searcher.search(query); > } > catch (IOException ioe) { > LogManager.log(LogManager.LOG_ERROR,"Cannot search in index.",ioe); > } > finally { > try { > searcher.close(); > } > catch (IOException ioe) { > LogManager.log(LogManager.LOG_WARNING,"Cannot close searcher in search > method.",ioe); > } > } > } > > return hits; > } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
searching while indexing.
What is the best way to implement: searching while indexing. I have read the mailinglist for a while but haven`t got a good answer to my question. It is not allowed to index, while searching. But I don`t understand why. All the segments are immutable, so after I have created a Reader it could use all the segments that are available at the moment. The reader maintains references to those segments, and if the reader is not needed anymore (or the writer says: I`m finished creating new indices... you should can search through a newer set of segments) the reader could delete all the old segments. The writer can create new segments based on the immutable-old ones and based on the new documents. After it has created a new set, it can signal the reader to use the newer segments. So why is the above scenario not possible? Why are segments immutable? And what is the best way to add documents to a (big index >20 gig) without copying the index, and without blocking the search? Met vriendelijke groet, Peter Veentjer Anchor Men Interactive Solutions - duidelijk in zakelijke internetoplossingen Praediniussingel 41 9711 AE Groningen T: 050-3115222 F: 050-5891696 E: [EMAIL PROTECTED] I : www.anchormen.nl http://www.anchormen.nl/>
Re: do a simple search
On Jan 5, 2005, at 3:41 AM, [EMAIL PROTECTED] wrote: I would like to search a word in differents fields of a document with an OR operator. My fields are "id", "shortDesc" and "longDesc". In java I want to search a word simultanly in "shortDesc" and "longDesc" field. for example: doc1: id:1 shortDesc: a foo desc longDesc: a doc long desc doc2: id:2 shortDesc:a doc short desc longDesc:a foo long desc doc3: id:3 shortDesc:another short desc longDesc:another long desc if the search word is "foo" i want to retreive doc1 and doc3. You meant doc1 and doc2. in my program, fields are stored in fieldName list. associated values are stored in fieldValue. What's the question? The code you show below, at first glance, looks reasonable, or at least close. What is the value of query.toString() Erik private static Hits search(List fieldName, List fieldValue) { Hits hits = null; int fieldNameSize = fieldName.size(); int fieldValueSize = fieldValue.size(); if (fieldNameSize != fieldValueSize) { return null; } IndexSearcher searcher = getSearcher(); if (searcher != null) { BooleanQuery query = new BooleanQuery(); //populate the query with all terms for (int i=0; i StringTokenizer tokenizer = new StringTokenizer(currentFieldValue); while (tokenizer.hasMoreTokens()) { String currentToken = tokenizer.nextToken(); Term currentTerm = new Term(currentFieldName,currentToken); TermQuery termQuery = new TermQuery(currentTerm); query.add(termQuery,false,false); } } //do the search try { //System.out.println("IndexManager query = " + query.toString()); hits = searcher.search(query); } catch (IOException ioe) { LogManager.log(LogManager.LOG_ERROR,"Cannot search in index.",ioe); } finally { try { searcher.close(); } catch (IOException ioe) { LogManager.log(LogManager.LOG_WARNING,"Cannot close searcher in search method.",ioe); } } } return hits; } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
do a simple search
hello I would like to search a word in differents fields of a document with an OR operator. My fields are "id", "shortDesc" and "longDesc". In java I want to search a word simultanly in "shortDesc" and "longDesc" field. for example: doc1: id:1 shortDesc: a foo desc longDesc: a doc long desc doc2: id:2 shortDesc:a doc short desc longDesc:a foo long desc doc3: id:3 shortDesc:another short desc longDesc:another long desc if the search word is "foo" i want to retreive doc1 and doc3. in my program, fields are stored in fieldName list. associated values are stored in fieldValue. thanks private static Hits search(List fieldName, List fieldValue) { Hits hits = null; int fieldNameSize = fieldName.size(); int fieldValueSize = fieldValue.size(); if (fieldNameSize != fieldValueSize) { return null; } IndexSearcher searcher = getSearcher(); if (searcher != null) { BooleanQuery query = new BooleanQuery(); //populate the query with all terms for (int i=0; i