RE: Query question
Otis, Are you referring to this: How do I retrieve all the values of a particular field that exists within an index, across all documents? I need a query to do it, the only way clients access the index is via queries so they cannot write the code in the faq above. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 10, 2003 5:05 PM To: Lucene Users List Subject: Re: Query question Go to Lucene FAQ at jGuru.com and search for the word 'all'. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Hi all, I have a field called echelon that are assigned to certain files. Is there a query I can write that will give me all files that have this field? I have tried stuff like echelon:.+*, echelon:*, etc... some give a query parser exception while others return nothing. Let me know, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query question
Hi all, I have a field called echelon that are assigned to certain files. Is there a query I can write that will give me all files that have this field? I have tried stuff like echelon:.+*, echelon:*, etc... some give a query parser exception while others return nothing. Let me know, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Checkpointable Index
Hi all, We have sandboxed file system which Lucene indexes. Periodically we dump the file system to disk (checkpoint it); can a Lucene index be checkpointed then restored and used? Currently we simply rebuild the index since it only takes a few minutes. But we would like the user to be able to take a snapshot of that file system and restore and use it without rebuilding the index. Let me know. Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Luke - Lucene Index Browser
Luke looks pretty slick. Was wondering how difficult would it be to add code to add fields, update fields, etc.. I have written something similar to Luke but next month I need to add support to update fields, remove, add, etc.. graphically. Thanks, Rob -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Monday, July 14, 2003 11:48 AM To: Lucene Users List Subject: Luke - Lucene Index Browser Dear Lucene Users, Luke is a diagnostic tool for Lucene (http://jakarta.apache.org/lucene) indexes. It enables you to browse documents in existing indexes, perform queries, navigate through terms, optimize indexes and more. Please go to http://www.getopt.org/luke and give it a try. A Java WebStart version will be available soon. -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
java.io.IOException: Cannot delete deletetable
Hi all, I am intermittently getting the above exception while build an index. I have been trying for an house to reproduce it but can't as of yet. But in any case I was wondering if anyone knew anything about the above error and if so how to stop it from occurring. In the stack trace I printed out, it looked like it was in the rename method of FSDirectory that the exception occurred. As soon as I can replicate I will post the exception and any additional information requested. Thanks as always, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: java.io.IOException: Cannot delete deletetable
We use windows and linux but I have only seen this error on Windows so far. I will check the Jar file I am using to make sure it is the most recent, I am assuming the most recent is Lucene 1.3 RC1 ? Thanks, Rob -Original Message- From: Matt Tucker [mailto:[EMAIL PROTECTED] Sent: Thursday, June 19, 2003 2:03 PM To: Lucene Users List Subject: Re: java.io.IOException: Cannot delete deletetable Rob, Are you using the very latest Lucene code? The standard File.renameTo operation fails every once in awhile, especially on Windows. I sent in a patch that was put in somewhat recently. It fixed all the errors we were seeing with renames. Regards, Matt Rob Outar wrote: Hi all, I am intermittently getting the above exception while build an index. I have been trying for an house to reproduce it but can't as of yet. But in any case I was wondering if anyone knew anything about the above error and if so how to stop it from occurring. In the stack trace I printed out, it looked like it was in the rename method of FSDirectory that the exception occurred. As soon as I can replicate I will post the exception and any additional information requested. Thanks as always, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: java.lang.IllegalArgumentException: attempt to access a deleted document
I added the following code: for (int i = 0; i numOfDocs; i++) { if ( !reader.isDeleted(i)) { doc = reader.document(i); docs[i] = doc.get(SearchEngineConstants.REPOSITORY_PATH); } } return docs; but it never goes in the if statement, for every value of i, isDeleted(i) is returning true?!? Am I doing something wrong? I was trying to do what Doug outlined below. Thanks, Rob -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 04, 2003 12:34 PM To: Lucene Users List Subject: Re: java.lang.IllegalArgumentException: attempt to access a deleted document Rob Outar wrote: public synchronized String[] getDocuments() throws IOException { IndexReader reader = null; try { reader = IndexReader.open(this.indexLocation); int numOfDocs = reader.numDocs(); String[] docs = new String[numOfDocs]; Document doc = null; for (int i = 0; i numOfDocs; i++) { doc = reader.document(i); docs[i] = doc.get(SearchEngineConstants.REPOSITORY_PATH); } return docs; } finally { if (reader != null) { reader.close(); } } } The limit of your iteration should be IndexReader.maxDoc(), not IndexReader.numDocs(): http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReade r.html#maxDoc() Also, you should first check that each document is not deleted before calling IndexReader.document(int): http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReade r.html#isDeleted(int) Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Analyzer Incorrect?
Hi all, Sorry for the flood of questions this week, clients finally started using the search engine I wrote which uses Lucene. When I first started developing with Lucene the Analyzers it came with did some odd things so I decided to implement my own but it is not working the way I expect it to. First and foremost I would like to like to have case insensitive searches and I do not want to tokenize the fields. No field will ever have a space in it so therefore there is no need to tokenize it. I came up with this Analyzer but case still seems to be an issue: public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize any field TokenStream t = new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; //case insensitive search t = new LowerCaseFilter(t); return t; } Is there anything I am doing wrong in the Analyzer I have written? Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Analyzer Incorrect?
Yeah it has been a bad week. I don't think Query parser is not lowercasing my fields, maybe it is something I am doing wrong: public synchronized String[] queryIndex(String query) throws ParseException, IOException { checkForIndexChange(); QueryParser p = new QueryParser(, new RepositoryIndexAnalyzer()); this.query = p.parse(query); Hits hits = this.searcher.search(this.query); return buildReturnArray(hits); } When I create Querypaser I do not want it to have default field since clients can query on whatever field they want. I use my Analyzer which I do not think is lowercasing the fields because I have tested querying with all lowercase (got results) with mixed case (no results) so I think my code or my analyzer is hosed. Thanks, Rob -Original Message- From: Tatu Saloranta [mailto:[EMAIL PROTECTED] Sent: Friday, April 04, 2003 9:09 AM To: Lucene Users List Subject: Re: Analyzer Incorrect? On Friday 04 April 2003 05:24, Rob Outar wrote: Hi all, Sorry for the flood of questions this week, clients finally started using the search engine I wrote which uses Lucene. When I first started Yup... that's the root of all evil. :-) (I'm in similar situation, going through user acceptance test as we speak... and getting ready to do second version that'll have more advanced metadata based search using Lucene). developing with Lucene the Analyzers it came with did some odd things so I decided to implement my own but it is not working the way I expect it to. First and foremost I would like to like to have case insensitive searches and I do not want to tokenize the fields. No field will ever have a space If you don't need to tokenize a field, you don't need an analyzer either. However, to get case insensitive search, you should lower-case field contents before adding them to document. QueryParser will do lower casing for search terms automatically (if you are using it), so matching should work fine then. -+ Tatu +- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
Would there be any abnormal effects if after adding a document, you called optimize? I am still seeing a large growth from setting a field. When I set a field I: 1. Get the document 2. Remove the field. 3. Write the document to index 4. Get the document again. 5. Add the new field object. 6. Write the document to index. 7. Call optimize. From writing out my steps it looks like I should write a set method instead of treating set as removeField() and addField(), I thought combining these two would equal set which it does, but it seems horribly inefficient. But in any case would the above cause in the index to grow from say 10.5 megs to 31 megs? Is there any efficient way to implement a set, for example if there was a field value pair of book/hamlet, but now we wanted to set book = none? Please keep in mind there could be multiple field names with book. So it is not simply a matter of removing the field book and then readding it. Anyhow let me know your thoughts. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2003 11:35 AM To: Lucene Users List Subject: RE: Indexing Growth Funny how this is the outcome of 90% of the problems people have with software - their own mistakes :) Regarding reindexing - no need for any explicit calls. When you add a document to the index it is indexed right away. You will have to detect index change (methods for that are there) and re-open the IndexSearcher in order to see newly added/indexed documents. Otis --- Rob Outar [EMAIL PROTECTED] wrote: I found the freakin problem, I am going to kill my co-worker when he gets in. He was removing a field and adding the same field back for each document in the index in a piece of code I did not notice until now He is so dead. I commented out that piece of code, queried to my hearts content and the index has not changed. Heck the tool is like super fast now. One last concern is about the re-indexing thing, when does that occur? optimize()? I am curious what method would cause a reindex. I want to thank all of you for your help, it was truly appreciated! Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
I took out the optimize() after the write and the index is growing but at like a 1kb rate, but now there are tons of 1kb files. I assume at this optimize would fix this? What is a good rule of thumb for calling optimize()? Will Lucene ever invoke an optimize() on it's own? Thanks, Rob Outar OneSAF AI -- SAIC Software\Data Engineer 321-235-7660 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 10:53 AM To: Lucene Users List Subject: RE: Indexing Growth Would there be any abnormal effects if after adding a document, you called optimize? I am still seeing a large growth from setting a field. When I set a field I: 1. Get the document 2. Remove the field. 3. Write the document to index 4. Get the document again. 5. Add the new field object. 6. Write the document to index. 7. Call optimize. From writing out my steps it looks like I should write a set method instead of treating set as removeField() and addField(), I thought combining these two would equal set which it does, but it seems horribly inefficient. But in any case would the above cause in the index to grow from say 10.5 megs to 31 megs? Is there any efficient way to implement a set, for example if there was a field value pair of book/hamlet, but now we wanted to set book = none? Please keep in mind there could be multiple field names with book. So it is not simply a matter of removing the field book and then readding it. Anyhow let me know your thoughts. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2003 11:35 AM To: Lucene Users List Subject: RE: Indexing Growth Funny how this is the outcome of 90% of the problems people have with software - their own mistakes :) Regarding reindexing - no need for any explicit calls. When you add a document to the index it is indexed right away. You will have to detect index change (methods for that are there) and re-open the IndexSearcher in order to see newly added/indexed documents. Otis --- Rob Outar [EMAIL PROTECTED] wrote: I found the freakin problem, I am going to kill my co-worker when he gets in. He was removing a field and adding the same field back for each document in the index in a piece of code I did not notice until now He is so dead. I commented out that piece of code, queried to my hearts content and the index has not changed. Heck the tool is like super fast now. One last concern is about the re-indexing thing, when does that occur? optimize()? I am curious what method would cause a reindex. I want to thank all of you for your help, it was truly appreciated! Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Querying Question
Hi all, I am a little fuzzy on complex querying using AND, OR, etc.. For example: I have the following name/value pairs file 1 = name = checkpoint value = filename_1 file 2 = name = checkpoint value = filename_2 file 3 = name = checkpoint value = filename_3 file 4 = name = checkpoint value = filename_4 I ran the following Query: name:\checkpoint\ AND value:\filenane_1\ Instead of getting back file 1, I got back all four files? Then after trying different things I did: +(name:\checkpoint\) AND +(value:\filenane_1\) it then returned file 1. Our project queries solely on name value pairs and we need the ability to query using AND, OR, NOTS, etc.. What the correct syntax for such queries? The code I use is : QueryParser p = new QueryParser(, new RepositoryIndexAnalyzer()); this.query = p.parse(query.toLowerCase()); Hits hits = this.searcher.search(this.query); Thanks as always, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Querying Question
RepositoryIndexAnalyzer : /** * Creates a TokenStream which tokenizes all the text in the provided Reader. * Default implementation forwards to tokenStream(Reader) for compatibility * with older version. Override to allow Analyzer to choose strategy based * on document and/or field. * @param field is the name of the field * @param reader is the data * @return a token stream * @build 10 */ public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize any field TokenStream t = new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; //case insensitive search t = new LowerCaseFilter(t); return t; } but earlier when I did a query case became an issue I am not sure why as the analyzer should have lowercased the token but it did not. Thanks, Rob -Original Message- From: Eric Isakson [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:23 PM To: Lucene Users List Subject: RE: Querying Question This query.toLowerCase() lowercased your query to become: name:\checkpoint\ and value:\filenane_1\ The keyword AND must be uppercase when the query parser gets a hold of it. If your RepositoryIndexAnalyzer lowercases its tokens you don't need to do query.toLowerCase(). If it doesn't lowercase its tokens, you may want to modify it so that it does. Eric -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2003 5:11 PM To: Lucene Users List Subject: Querying Question Importance: High Hi all, I am a little fuzzy on complex querying using AND, OR, etc.. For example: I have the following name/value pairs file 1 = name = checkpoint value = filename_1 file 2 = name = checkpoint value = filename_2 file 3 = name = checkpoint value = filename_3 file 4 = name = checkpoint value = filename_4 I ran the following Query: name:\checkpoint\ AND value:\filenane_1\ Instead of getting back file 1, I got back all four files? Then after trying different things I did: +(name:\checkpoint\) AND +(value:\filenane_1\) it then returned file 1. Our project queries solely on name value pairs and we need the ability to query using AND, OR, NOTS, etc.. What the correct syntax for such queries? The code I use is : QueryParser p = new QueryParser(, new RepositoryIndexAnalyzer()); this.query = p.parse(query.toLowerCase()); Hits hits = this.searcher.search(this.query); Thanks as always, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
Hi all, This is too odd and I do not even know where to start. We built a Windows Explorer type tool that indexes all files in a sabdboxed file system. Each Lucene document contains stuff like path, parent directory, last modified date, file_lock etc.. When we display the files in a given directory through the tool we query the index about 5 times for each file in the repository, this is done so we can display all attributes in the index about that file. So for example if there are 5 files in the directory, each file has 6 attributes that means about 30 term queries are executed. The initial index when build it about 10.4megs, after accessing about 3 or 4 directories the index size increased to over 100megs, and we did not add anything!! All we are doing is querying!! Yesterday after querying became ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB (granted we tested the tool all morning). But I have no idea why the index is growing like this. ANY help would be greatly appreciated. Thanks, Rob -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:32 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: RE: Indexing Growth I reuse the same searcher, analyzer and Query object I don't think that should cause the problem. Thanks, Rob -Original Message- From: Alex Murzaku [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:22 PM To: 'Lucene Users List' Subject: RE: Indexing Growth I don't know if I remember this correctly: I think for every query (term) is created a file but the file should disappear after the query is completed. -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:13 PM To: Lucene Users List Subject: RE: Indexing Growth Dang I must be doing something crazy cause all my client app does is search and the index size increases. I do not add anything. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:07 PM To: Lucene Users List Subject: Re: Indexing Growth Only when you add new documents to it. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Hi all, Will the index grow based on queries alone? I build my index, then run several queries against it and afterwards I check the size of the index and in some cases it has grown quite a bit although I did not add anything??? Anyhow please let me know the cases when the index will grow. Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://platinum.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
Additional info on the problem, the index contains several 1kb files and several files that have different names, but the same file size. It looks like the files that comprise the index are being duplicated causing the index to become huge. Thanks, Rob -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2003 8:51 AM To: Lucene Users List Subject: RE: Indexing Growth Importance: High Hi all, This is too odd and I do not even know where to start. We built a Windows Explorer type tool that indexes all files in a sabdboxed file system. Each Lucene document contains stuff like path, parent directory, last modified date, file_lock etc.. When we display the files in a given directory through the tool we query the index about 5 times for each file in the repository, this is done so we can display all attributes in the index about that file. So for example if there are 5 files in the directory, each file has 6 attributes that means about 30 term queries are executed. The initial index when build it about 10.4megs, after accessing about 3 or 4 directories the index size increased to over 100megs, and we did not add anything!! All we are doing is querying!! Yesterday after querying became ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB (granted we tested the tool all morning). But I have no idea why the index is growing like this. ANY help would be greatly appreciated. Thanks, Rob -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:32 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: RE: Indexing Growth I reuse the same searcher, analyzer and Query object I don't think that should cause the problem. Thanks, Rob -Original Message- From: Alex Murzaku [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:22 PM To: 'Lucene Users List' Subject: RE: Indexing Growth I don't know if I remember this correctly: I think for every query (term) is created a file but the file should disappear after the query is completed. -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:13 PM To: Lucene Users List Subject: RE: Indexing Growth Dang I must be doing something crazy cause all my client app does is search and the index size increases. I do not add anything. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:07 PM To: Lucene Users List Subject: Re: Indexing Growth Only when you add new documents to it. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Hi all, Will the index grow based on queries alone? I build my index, then run several queries against it and afterwards I check the size of the index and in some cases it has grown quite a bit although I did not add anything??? Anyhow please let me know the cases when the index will grow. Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://platinum.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
Just about everything calls getValue: public synchronized String getValue(String key, File file) throws ParseException, IOException { Document doc = getDocument(file); return doc.get(key.toLowerCase()); } which calls get document: private synchronized Document getDocument(File file) throws MalformedURLException, IOException { checkForIndexChange(); Term t = new Term(PATH, file.toURI().toString().toLowerCase()); TermQuery tQ = new TermQuery(t); Hits hits= this.searcher.search(tQ); if (hits.length() == 1) { return hits.doc(0); } //this should never happen, cannot have a URL that returns 2 hits //that would mean the same file has been indexed twice else { return null; } Thanks, Rob -Original Message- From: Michael Barry [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2003 9:20 AM To: Lucene Users List Subject: Re: Indexing Growth Sounds like you either have an indexer that's run amok (maybe a background process that's continually re-indexing your sandbox - or expanding outside your sandbox) or your Query code is doing more than querying. It's not behaviour I've seen. Without a snippet of Query code, it's going to be hard to help. Rob Outar wrote: Hi all, This is too odd and I do not even know where to start. We built a Windows Explorer type tool that indexes all files in a sabdboxed file system. Each Lucene document contains stuff like path, parent directory, last modified date, file_lock etc.. When we display the files in a given directory through the tool we query the index about 5 times for each file in the repository, this is done so we can display all attributes in the index about that file. So for example if there are 5 files in the directory, each file has 6 attributes that means about 30 term queries are executed. The initial index when build it about 10.4megs, after accessing about 3 or 4 directories the index size increased to over 100megs, and we did not add anything!! All we are doing is querying!! Yesterday after querying became ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB (granted we tested the tool all morning). But I have no idea why the index is growing like this. ANY help would be greatly appreciated. Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
After building the index for the first time: _l1d.f1 _l1d.f3 _l1d.f5 _l1d.f7 _l1d.f9 _l1d.fdx _l1d.frq _l1d.tii deletable _l1d.f2 _l1d.f4 _l1d.f6 _l1d.f8 _l1d.fdt _l1d.fnm _l1d.prx _l1d.tis segments After running first query to get all attributes from all files in the given directory, there were 17 files, each file has 5 attributes so 85 queries were ran: _l1j.f1 _l1p.f9 _l21.f3 _l27.fdx _l2j.f5 _l2p.prx _l31.f7 _l3j.f1 _l3p.f9 _l41.f3 _l44.fdx _l1j.f2 _l1p.fdt _l21.f4 _l27.frq _l2j.f6 _l2p.tis _l31.f8 _l3j.f2 _l3p.fdt _l41.f4 _l44.frq _l1j.f3 _l1p.fdx _l21.f5 _l27.prx _l2j.f7 _l2v.f1 _l31.f9 _l3j.f3 _l3p.fdx _l41.f5 _l44.prx _l1j.f4 _l1p.frq _l21.f6 _l27.tis _l2j.f8 _l2v.f2 _l31.fdt _l3j.f4 _l3p.frq _l41.f6 _l44.tis _l1j.f5 _l1p.prx _l21.f7 _l2d.f1 _l2j.f9 _l2v.f3 _l31.fdx _l3j.f5 _l3p.prx _l41.f7 _l47.f1 _l1j.f6 _l1p.tis _l21.f8 _l2d.f2 _l2j.fdt _l2v.f4 _l31.frq _l3j.f6 _l3p.tis _l41.f8 _l47.f2 _l1j.f7 _l1v.f1 _l21.f9 _l2d.f3 _l2j.fdx _l2v.f5 _l31.prx _l3j.f7 _l3v.f1 _l41.f9 _l47.f3 _l1j.f8 _l1v.f2 _l21.fdt _l2d.f4 _l2j.frq _l2v.f6 _l31.tis _l3j.f8 _l3v.f2 _l41.fdt _l47.f4 _l1j.f9 _l1v.f3 _l21.fdx _l2d.f5 _l2j.prx _l2v.f7 _l37.f1 _l3j.f9 _l3v.f3 _l41.fdx _l47.f5 _l1j.fdt _l1v.f4 _l21.frq _l2d.f6 _l2j.tis _l2v.f8 _l37.f2 _l3j.fdt _l3v.f4 _l41.frq _l47.f6 _l1j.fdx _l1v.f5 _l21.prx _l2d.f7 _l2p.f1 _l2v.f9 _l37.f3 _l3j.fdx _l3v.f5 _l41.prx _l47.f7 _l1j.frq _l1v.f6 _l21.tis _l2d.f8 _l2p.f2 _l2v.fdt _l37.f4 _l3j.frq _l3v.f6 _l41.tis _l47.f8 _l1j.prx _l1v.f7 _l27.f1 _l2d.f9 _l2p.f3 _l2v.fdx _l37.f5 _l3j.prx _l3v.f7 _l44.f1 _l47.f9 _l1j.tis _l1v.f8 _l27.f2 _l2d.fdt _l2p.f4 _l2v.frq _l37.f6 _l3j.tis _l3v.f8 _l44.f2 _l47.fdt _l1p.f1 _l1v.f9 _l27.f3 _l2d.fdx _l2p.f5 _l2v.prx _l37.f7 _l3p.f1 _l3v.f9 _l44.f3 _l47.fdx _l1p.f2 _l1v.fdt _l27.f4 _l2d.frq _l2p.f6 _l2v.tis _l37.f8 _l3p.f2 _l3v.fdt _l44.f4 _l47.fnm _l1p.f3 _l1v.fdx _l27.f5 _l2d.prx _l2p.f7 _l31.f1 _l37.f9 _l3p.f3 _l3v.fdx _l44.f5 _l47.frq _l1p.f4 _l1v.frq _l27.f6 _l2d.tis _l2p.f8 _l31.f2 _l37.fdt _l3p.f4 _l3v.frq _l44.f6 _l47.prx _l1p.f5 _l1v.prx _l27.f7 _l2j.f1 _l2p.f9 _l31.f3 _l37.fdx _l3p.f5 _l3v.prx _l44.f7 _l47.tii _l1p.f6 _l1v.tis _l27.f8 _l2j.f2 _l2p.fdt _l31.f4 _l37.frq _l3p.f6 _l3v.tis _l44.f8 _l47.tis _l1p.f7 _l21.f1 _l27.f9 _l2j.f3 _l2p.fdx _l31.f5 _l37.prx _l3p.f7 _l41.f1 _l44.f9 deletable _l1p.f8 _l21.f2 _l27.fdt _l2j.f4 _l2p.frq _l31.f6 _l37.tis _l3p.f8 _l41.f2 _l44.fdt segments I have no reason to add anything to the index all I want to do is getch the attributes for the list of files in that directory. Thanks, Rob -Original Message- From: Ian Lea [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2003 9:24 AM To: Rob Outar Cc: Lucene Users List Subject: RE: Indexing Growth What does the index directory look like before and after running queries? Are files growing or being added? Which files? How many documents are there in the index before and after? Are you absolutely 100% positive there is no way that your application is adding entries to the index? That still has to be the most likely explanation, I think. -- Ian. [EMAIL PROTECTED] [EMAIL PROTECTED] (Rob Outar) wrote Hi all, This is too odd and I do not even know where to start. We built a Windows Explorer type tool that indexes all files in a sabdboxed file system. Each Lucene document contains stuff like path, parent directory, last modified date, file_lock etc.. When we display the files in a given directory through the tool we query the index about 5 times for each file in the repository, this is done so we can display all attributes in the index about that file. So for example if there are 5 files in the directory, each file has 6 attributes that means about 30 term queries are executed. The initial index when build it about 10.4megs, after accessing about 3 or 4 directories the index size increased to over 100megs, and we did not add anything!! All we are doing is querying!! Yesterday after querying became ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB (granted we tested the tool all morning). But I have no idea why the index is growing like this. ANY help would be greatly appreciated. Thanks, Rob -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:32 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: RE: Indexing Growth I reuse the same searcher, analyzer and Query object I don't think that should cause the problem. Thanks, Rob -Original Message- From: Alex Murzaku [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:22 PM To: 'Lucene Users List' Subject: RE
RE: Indexing Growth
/** * Returns true if the index has changed. * @return true iff the index has been changed since the IndexSearcher * class was created. * @build 10 */ private synchronized boolean hasIndexChanged() { try { long temp = IndexReader.lastModified(this.indexLocation); return temp this.lastModified; } //assume it has changed catch (IOException e) { return true; } } /** * Checks whether the index has changed since the IndexSearcher was * created, if it has IndexSearcher is reinitalized. * @build 10 */ private synchronized void checkForIndexChange() { try { if ( hasIndexChanged()) { this.searcher = new IndexSearcher(this.indexLocation); } } catch (IOException e) { } } Thanks, Rob -Original Message- From: Ian Lea [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2003 10:32 AM To: Rob Outar Cc: Lucene Users List Subject: RE: Indexing Growth They look like the type of file name that would be created when documents were added to the index. So I still think something is adding stuff to your index. Could it be an external process as someone suggested? Does the index grow even if you don't search? In the code you posted, what does checkForIndexChange() do? Yes, I can guess what it is supposed to do, but is it perhaps doing something else as well or instead, directly or indirectly? -- Ian. [EMAIL PROTECTED] (Rob Outar) wrote After building the index for the first time: _l1d.f1 _l1d.f3 _l1d.f5 _l1d.f7 _l1d.f9 _l1d.fdx _l1d.frq _l1d.tii deletable _l1d.f2 _l1d.f4 _l1d.f6 _l1d.f8 _l1d.fdt _l1d.fnm _l1d.prx _l1d.tis segments After running first query to get all attributes from all files in the given directory, there were 17 files, each file has 5 attributes so 85 queries were ran: _l1j.f1 _l1p.f9 _l21.f3 _l27.fdx _l2j.f5 _l2p.prx _l31.f7 _l3j.f1 _l3p.f9 _l41.f3 _l44.fdx _l1j.f2 _l1p.fdt _l21.f4 _l27.frq _l2j.f6 _l2p.tis _l31.f8 _l3j.f2 _l3p.fdt _l41.f4 _l44.frq ... -- Searchable personal storage and archiving from http://www.digimem.net/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
I found the freakin problem, I am going to kill my co-worker when he gets in. He was removing a field and adding the same field back for each document in the index in a piece of code I did not notice until now He is so dead. I commented out that piece of code, queried to my hearts content and the index has not changed. Heck the tool is like super fast now. One last concern is about the re-indexing thing, when does that occur? optimize()? I am curious what method would cause a reindex. I want to thank all of you for your help, it was truly appreciated! Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Indexing Growth
Hi all, Will the index grow based on queries alone? I build my index, then run several queries against it and afterwards I check the size of the index and in some cases it has grown quite a bit although I did not add anything??? Anyhow please let me know the cases when the index will grow. Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
Dang I must be doing something crazy cause all my client app does is search and the index size increases. I do not add anything. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:07 PM To: Lucene Users List Subject: Re: Indexing Growth Only when you add new documents to it. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Hi all, Will the index grow based on queries alone? I build my index, then run several queries against it and afterwards I check the size of the index and in some cases it has grown quite a bit although I did not add anything??? Anyhow please let me know the cases when the index will grow. Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://platinum.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing Growth
I reuse the same searcher, analyzer and Query object I don't think that should cause the problem. Thanks, Rob -Original Message- From: Alex Murzaku [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:22 PM To: 'Lucene Users List' Subject: RE: Indexing Growth I don't know if I remember this correctly: I think for every query (term) is created a file but the file should disappear after the query is completed. -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:13 PM To: Lucene Users List Subject: RE: Indexing Growth Dang I must be doing something crazy cause all my client app does is search and the index size increases. I do not add anything. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2003 3:07 PM To: Lucene Users List Subject: Re: Indexing Growth Only when you add new documents to it. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Hi all, Will the index grow based on queries alone? I build my index, then run several queries against it and afterwards I check the size of the index and in some cases it has grown quite a bit although I did not add anything??? Anyhow please let me know the cases when the index will grow. Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://platinum.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
write.lock
Hi all, I am experiencing an odd problem where sometimes the write.lock files gets left behind. I have looked over all the my code and I close IndexWriter after I use it. I do a lot of batch processing where I write tons of files to the index. Has anyone run across this before? Is IndexWriter the only class that creates the write.lock file? When is that write.lock file deleted? Let me know. Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Quick Question On Adding Fields
What happens if I add the same name/value pair to a Lucene Document? Does it override it? Does it append it so you have duplicates? Let me know, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Quick Question On Adding Fields
I ran a little test where I did: doc.add(new Field(name,value)); doc.add(new Field(name,value)); Then got a list of the field for that doc and sure enough it is in there twice. So it appends whatever value to the field, even if the value already exists. Thanks, Rob -Original Message- From: David Spencer [mailto:[EMAIL PROTECTED] Sent: Thursday, March 20, 2003 8:53 AM To: Lucene Users List Subject: Re: Quick Question On Adding Fields Rob Outar wrote: What happens if I add the same name/value pair to a Lucene Document? Does it override it? Does it append it so you have duplicates? I believe it 'appends' in the sense that if you add 2 fields with the same name then the Document has the union of the content of both fields added, and then you can search on anything in either or both of the field values you added. One use case is if you're indexing html and you want a field for the title, a field for the body, and a easy way for users to refer to both the field and the body in a query. So when you add a Field for the title named title, you also add one with a name like contents, and then you add a field for the body named body, and then you pass the same data and add another field named contents. Then, voilla, a search on contents:foo returns matches against the title and the body. Let me know, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching for hyphenated terms
I had similar problems that were solved with this Analyzer: public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize any field TokenStream t = new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; //case insensitive search t = new LowerCaseFilter(t); return t; } Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, March 13, 2003 11:22 AM To: Lucene Users List Subject: Re: Searching for hyphenated terms Make a custom Analyzer. They are super simple to write. Take pieces of WhitespaceAnalyzer and the Standard one. Otis --- Sieretzki, Dionne R, SOLGV [EMAIL PROTECTED] wrote: I have seen some previous postings about Escape woes and Hyphens not matching, but I haven't seen any resolutions to an issue I've been trying to work out. I don't want my search field to be case sensitive, so I used StandardAnalyzer. The search field also has corresponding entries that may or may not contain hyphens or other special characters. If the field is not tokenized, very few search terms result in matches. It appears that terms are only matched if a wildcard is used, such as: Entered: ADOG / Actual Query is: adog / No match on an exact term Entered: ADOG* / Actual Query is: ADOG* / Match found Entered: AAA-ADOG / Actual Query is: aaa -adog / No match Entered: AAA-ADOG / Actual Query is: aaa adog / No match Entered: AAA?ADOG / Actual Query is: aaa?adog / Match found Entered: DOG.2 / Actual Query is: dog.2 / No match Entered: DOG?2 / Actual Query is: DOG?2 / Match found If the field is tokenized, then even more mixed results are produced. Entered: ADOG / Actual Query is: adog / Match found for exact term Entered: ADOG* / Acutal Query is: ADOG* / No match Entered: AAA-ADOG / Actual Query is: aaa -adog / Match found Entered: AAA-ADOG / Actual Query is: aaa adog / Match found Entered: DOG.2 / Actual Query is: adog.2 / Match found Entered: AAA-DOG-BBB / Actual Query is: aaa -dog -bbb / No match Entered: AAA-DOG-BBB / Actual Query is: aaa dog bbb / No match Entered: ADOG-I40 / Actual Query is: adog -i40 / Incorrect matches Entered: ADOG-I40 / Actual Query is: adog-i40 / Match found for exact term Can anyone recommend the right Analyzer to use that isn't case sensitive and matches on both hyphenated and non-hyphenated terms? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Web Hosting - establish your business online http://webhosting.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: OutOfMemoryException while Indexing an XML file
We are aware of DOM limitations/memory problems, but I am using SAX to parse the file and index elements and attributes in my content handler. Thanks, Rob -Original Message- From: Tatu Saloranta [mailto:[EMAIL PROTECTED]] Sent: Friday, February 14, 2003 8:18 PM To: Lucene Users List Subject: Re: OutOfMemoryException while Indexing an XML file On Friday 14 February 2003 07:27, Aaron Galea wrote: I had this problem when using xerces to parse xml documents. The problem I think lies in the Java garbage collector. The way I solved it was to create It's unlikely that GC is the culprit. Current ones are good at purging objects that are unreachable, and only throw OutOfMem exception when they really have no other choice. Usually it's the app that has some dangling references to objects that prevent GC from collecting objects not useful any more. However, it's good to note that Xerces (and DOM parsers in general) generally use more memory than the input XML files they process; this because they usually have to keep the whole document struct in memory, and there is overhead on top of text segments. So it's likely to be at least 2 * input file size (files usually use UTF-8 which most of the time uses 1 byte per char; in memory 16-bit unicode-2 chars are used for performance), plus some additional overhead for storing element structure information and all that. And since default max. java heap size is 64 megs, big XML files can cause problems. More likely however is that references to already processed DOM trees are not nulled in a loop or something like that? Especially if doing one JVM process for item solves the problem. a shell script that invokes a java program for each xml file that adds it to the index. -+ Tatu +- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: OutOfMemoryException while Indexing an XML file
So to the best of your knowledge the Lucene Document Object should not cause the exception even though the XML file is huge and 1000's of fields are being added to the Lucene Document Object? Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Friday, February 14, 2003 8:21 AM To: Lucene Users List Subject: Re: OutOfMemoryException while Indexing an XML file Nothing in the code snippet you sent would cause that exception. If I were you I'd run it under a profiler to quickly see where the leak is. You can even use something free like JMP. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Hi all, I was using the sample code provided I believe by Doug Cutting to index an XML file, the XML file was 2 megs (kinda large) but while adding fields to the Document object I got an OutOfMemoryException exception. I work with XML files a lot, I can easily parse that 2 meg file into a DOM tree, I can't imagine a Lucene document being larger than a DOM Tree, pasted below is the SAX handler. public class XMLDocumentBuilder extends DefaultHandler { /** A buffer for each XML element */ private StringBuffer elementBuffer = new StringBuffer(); private Document mDocument; public void buildDocument(Document doc, String xmlFile) throws IOException, SAXException { this.mDocument = doc; SAXReader.parse(xmlFile, this); } public void startElement(String uri, String localName, String qName, Attributes atts) { elementBuffer.setLength(0); if (atts != null) { for (int i = 0; i atts.getLength(); i++) { String attname = atts.getLocalName(i); mDocument.add(new Field(attname, atts.getValue(i), true, true, true)); } } } // call when cdata found public void characters(char[] text, int start, int length) { elementBuffer.append(text, start, length); } public void endElement(String uri, String localName, String qName) { mDocument.add(Field.Text(localName, elementBuffer.toString())); } public Document getDocument() { return mDocument; } } Any help would be appreciated. Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Shopping - Send Flowers for Valentine's Day http://shopping.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
write.lock file
Hello all, This is the first time I have encountered this in 3 months of testing, the above file got created, not sure how or when, but every time I try to write to the index I get an IOException about the indexing being locked. It is obviously due to that file but what would cause that lock to get created and not removed? Let me know. Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Searches are not case insensitive
From briefly looking at the code it looks like the field does not get touched it seems like the only part that gets converted to lower case is the value, so I am assuming that the field name is case sensitive but the value is not? Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 8:25 AM To: Lucene Users List Subject: Re: Searches are not case insensitive Why not add print statements to your analyzer to ensure that what you think is happening really is happening? Token has an attribute called 'text' that you could print, I believe. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Hello all, I created the following analyzer so that clients could pose case insensitive searches but queries are still case sensitive: // do not tokenize any field TokenStream t = new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; //case insensitive search t = new LowerCaseFilter(t); return t; } I use that index when I create a new instance of IndexWriter and when I use QueryPaser, I am not sure why my searches are still case dependent. Any help would be appreciated. Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Slash Problem
I don't know if this helps but I had exact same problem, I then stored the URI instead of the path, I was then able to search on the URI. Thanks, Rob -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 11:53 AM To: Lucene Users Group Subject: Slash Problem I've got a Text field (tokenized, indexed, stored) called 'path' which contains a string in the form of '1102\A3345-12RT.XML'. When I submit a query like path:1102* it works fine. But, when I try to be more specific (such as path:1102\a* or path:1102*a*) it fails. I've tried escaping the slash (path:1102\\a*) but that also fails. I'm using the StandardAnalyzer and the default QueryParser. Could anyone suggest what's going wrong here? Regards, Terry -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
How does delete work?
Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Updating documents
I have something odd going on, I have code that updates documents in the index so I have to delete it and then re add it. When I re-add the document I immediately do a search on the newly added field which fails. However, if I rerun the query a second time it works?? I have the Searcher class as an attribute of my search class, does it not see the new changes? Seems like when it is reinitialized with the changed index it is then able to search on the newly added field?? Let me know if anyone has encountered this. Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Updating documents
There is a reloading issue but I do not think lastModified is it: static long lastModified(Directory directory) Returns the time the index in this directory was last modified. static long lastModified(File directory) Returns the time the index in the named directory was last modified. static long lastModified(String directory) Returns the time the index in the named directory was last modified. Do I need to create a new instance of IndexSearcher each time I search? Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Friday, November 22, 2002 12:20 PM To: Lucene Users List Subject: Re: Updating documents Don't you have to make use of lastModified method (I think in IndexSearcher), to 'reload' your instance of IndexSearcher? I'm pulling this from some old, not very fresh memory Otis --- Rob Outar [EMAIL PROTECTED] wrote: I have something odd going on, I have code that updates documents in the index so I have to delete it and then re add it. When I re-add the document I immediately do a search on the newly added field which fails. However, if I rerun the query a second time it works?? I have the Searcher class as an attribute of my search class, does it not see the new changes? Seems like when it is reinitialized with the changed index it is then able to search on the newly added field?? Let me know if anyone has encountered this. Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
A little date help
Hello all, I am indexing the date using the java.io.file.lastModified() method doc.add(new Field(MODIFIED_DT, DateField.timeToString(f.lastModified()), true, true, true)); I am trying to search on this field, but I am having a hard time formatting the date correctly. I am not sure what date format lastModified() uses so trying to come up with a query in milliseconds for the above date field is difficult. Has anyone run into this problem? Is there an easier way to do this? Let me know, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Searching with Multiple Queries
Has anyone gotten a chance to review the below to make sure I am not doing something crazy. Thanks, Rob -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED]] Sent: Friday, November 15, 2002 12:59 PM To: Lucene Users List Subject: RE: Searching with Multiple Queries I did this and it works now I need you guys, the experts :-) to let me know if I am doing something terribly wrong: Analyzer: public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize any field return new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; } Query: releaseability:US Gov only the above returns hits. Let me know. Thanks, Rob -Original Message- From: Aaron Galea [mailto:[EMAIL PROTECTED]] Sent: Friday, November 15, 2002 10:53 AM To: Lucene Users List Subject: Re: Searching with Multiple Queries Rob I was reading again the mail and I think I didn't reply exactly to your question. In the code sent you can remove completely the StandardTokenizer() or else modify the code from JGuru itself. However I can't really tell you myself the effect this will have on your searches or indexing. Perhaps someone else might... Aaron - Original Message - From: Aaron Galea [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, November 15, 2002 4:35 PM Subject: Re: Searching with Multiple Queries Hi Rob Here is how I think in my case I will do it but the code is not tested so it might not work: 1. Create a filter class class SearcherFilter extends Filter { protected String Directory; public SearcherFilter(String dir) { Directory = dir; } public BitSet bits(IndexReader reader) throws IOException { BitSet bits = new BitSet(reader.maxDoc()); TermDocs termDocs = reader.termDocs(); while (termDocs.next()) { int iDoc = termDocs.doc(); org.apache.lucene.document.Document doc = reader.document(iDoc); Field fldDirectory = doc.getField(Directory); String str = fldDirectory.stringValue(); if (str.startsWith(Directory)){ bits.set(iDoc); } } return bits; } } 2. Create an Anlayzer class class SearcherAnalyzer extends Analyzer { /* * An array containing some common words that * are not usually useful for searching. */ private static final String[] STOP_WORDS = { a , and , are , as , at , be , but , by , for , if , in , into, is , it , no , not , of , on , or , s , such, t , that, the , their , then, there , these , they, this, to , was , will, with }; /* * Stop table */ final static private Hashtable stopTable = StopFilter.makeStopTable(STOP_WORDS); /* * create a token stream for this analyser */ public final TokenStream tokenStream(final Reader reader) { try { TokenStream result = new StandardTokenizer(reader); result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result,stopTable); result = new PorterStemFilter(result); return result; } catch (Exception e) { return null; } } } 3. In the main code use it this way: IndexSearcher searcher =new IndexSearcher(indexLocation); Query qry = QueryParser.parse(question, body, new SearcherAnalyzer()); Hits hits = searcher.search(qry, new SearcherFilter(directory)); In your case if you do not want for example to use the LetterTokenizer() do not included in the tokenStream method of the Anlayzer. Hope this helps, Aaron - Original Message - From: Rob Outar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, November 15, 2002 4:13 PM Subject: RE: Searching with Multiple Queries For example JGuru has this: public class MyAnalyzer extends Analyzer { private static final Analyzer STANDARD = new StandardAnalyzer(); public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize field called 'element' if (element.equals(field)) { return new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; } else { // use standard analyzer return STANDARD.tokenStream(field, reader); } } } I do not want any of my fields toekenized for now
RE: Not getting any results from query
I did not see where it said that I saw this: 'AND', 'OR', 'NOT', and FieldNames are case sensitive. Terms are case sensitive unless the lower case token filter is used during indexing and search. Field names are case sensitive. Even if it is the query: releaseability:Test R* should be valid. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Monday, November 18, 2002 1:53 PM To: Lucene Users List Subject: RE: Not getting any results from query Aren't wildcards case sensitive? Check the FAQ. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Thanks for all the good information/advice everyone, have one more little thing, below is my analyzer: public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize any field TokenStream t = new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; //case insensitive search t = new LowerCaseFilter(t); return t; } Field name = releaseability Value = Test Releaseability; How the field is set up: doc.add(new Field(releaseability, Test Releaseability, true, true, true)); This query works: releaseability:Test* however this one does not: releaseability:Test R* Any ideas why? Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Not getting any results from query
Does not work either, I think it has something to do with the space between the two words. This fails test r* but test*r* works. Understanding how the internal of Lucene work is one difficult task but this group does help a lot. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Monday, November 18, 2002 2:52 PM To: Lucene Users List Subject: RE: Not getting any results from query How does releaseability:test r* work? Returns anything? http://www.jguru.com/faq/view.jsp?EID=538312 Otis --- Rob Outar [EMAIL PROTECTED] wrote: I did not see where it said that I saw this: 'AND', 'OR', 'NOT', and FieldNames are case sensitive. Terms are case sensitive unless the lower case token filter is used during indexing and search. Field names are case sensitive. Even if it is the query: releaseability:Test R* should be valid. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Monday, November 18, 2002 1:53 PM To: Lucene Users List Subject: RE: Not getting any results from query Aren't wildcards case sensitive? Check the FAQ. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Thanks for all the good information/advice everyone, have one more little thing, below is my analyzer: public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize any field TokenStream t = new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; //case insensitive search t = new LowerCaseFilter(t); return t; } Field name = releaseability Value = Test Releaseability; How the field is set up: doc.add(new Field(releaseability, Test Releaseability, true, true, true)); This query works: releaseability:Test* however this one does not: releaseability:Test R* Any ideas why? Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Not getting any results from query
I am using the QueryParser class and for that query test r* it is forming a boolean query, not a prefix query. The problem is I allow clients to search on whatever they define, if I knew the fields they were searching on ahead of time then I could use classes that extend Query, but since I do not know I am forced to use QueryParser class. Thanks, Rob -Original Message- From: Rob Outar [mailto:[EMAIL PROTECTED]] Sent: Monday, November 18, 2002 3:03 PM To: Lucene Users List Subject: RE: Not getting any results from query Does not work either, I think it has something to do with the space between the two words. This fails test r* but test*r* works. Understanding how the internal of Lucene work is one difficult task but this group does help a lot. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Monday, November 18, 2002 2:52 PM To: Lucene Users List Subject: RE: Not getting any results from query How does releaseability:test r* work? Returns anything? http://www.jguru.com/faq/view.jsp?EID=538312 Otis --- Rob Outar [EMAIL PROTECTED] wrote: I did not see where it said that I saw this: 'AND', 'OR', 'NOT', and FieldNames are case sensitive. Terms are case sensitive unless the lower case token filter is used during indexing and search. Field names are case sensitive. Even if it is the query: releaseability:Test R* should be valid. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Monday, November 18, 2002 1:53 PM To: Lucene Users List Subject: RE: Not getting any results from query Aren't wildcards case sensitive? Check the FAQ. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Thanks for all the good information/advice everyone, have one more little thing, below is my analyzer: public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize any field TokenStream t = new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; //case insensitive search t = new LowerCaseFilter(t); return t; } Field name = releaseability Value = Test Releaseability; How the field is set up: doc.add(new Field(releaseability, Test Releaseability, true, true, true)); This query works: releaseability:Test* however this one does not: releaseability:Test R* Any ideas why? Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Searching with Multiple Queries
I thought this was my problem :-), anyhow can I just write an analyzer that does not tokenize the search string and use it with QueryPaser? Thanks, Rob -Original Message- From: Aaron Galea [mailto:agale;nextgen.net.mt] Sent: Friday, November 15, 2002 9:44 AM To: Lucene Users List Subject: Re: Searching with Multiple Queries Ok I will let you know the result thanks Aaron - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, November 15, 2002 3:37 PM Subject: Re: Searching with Multiple Queries I say: try it :) Otis --- Aaron Galea [EMAIL PROTECTED] wrote: I am not sure but I was going to do it by using a QueryParser and creating a filter that iterates over the documents. For each document I check the directory field and use the String.startsWith() function to make it kinda work like Prefix query. The Query and the Filter are then used in the IndexSearcher. Have not tried it yet but I think it will work, what do you say? Thanks Aaron - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, November 15, 2002 3:06 PM Subject: Re: Searching with Multiple Queries Sounds like 2 queries to me. You could do a prefix AND phrase, but that won't be exactly the same as doing a phrase query on subset of results of prefix query. Otis --- Aaron Galea [EMAIL PROTECTED] wrote: Hi everyone, I have indexed my documents using a hierarchical indexing by adding a directory field that is indexible but non-tokenized as suggested in the FAQ. Now I want to do a search first using a prefix query and then apply Phrase query on the returning results. Is this possible? Can it be applied at one go? Not sure whether MultiFieldQueryParser can be used this way. Any suggestions??? Thanks Aaron __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org --- [This E-mail was scanned for spam and viruses by NextGen.net.] --- [This E-mail was scanned for spam and viruses by NextGen.net.] -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org __ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org --- [This E-mail was scanned for spam and viruses by NextGen.net.] --- [This E-mail was scanned for spam and viruses by NextGen.net.] -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: Searching with Multiple Queries
I did this and it works now I need you guys, the experts :-) to let me know if I am doing something terribly wrong: Analyzer: public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize any field return new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; } Query: releaseability:US Gov only the above returns hits. Let me know. Thanks, Rob -Original Message- From: Aaron Galea [mailto:agale;nextgen.net.mt] Sent: Friday, November 15, 2002 10:53 AM To: Lucene Users List Subject: Re: Searching with Multiple Queries Rob I was reading again the mail and I think I didn't reply exactly to your question. In the code sent you can remove completely the StandardTokenizer() or else modify the code from JGuru itself. However I can't really tell you myself the effect this will have on your searches or indexing. Perhaps someone else might... Aaron - Original Message - From: Aaron Galea [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, November 15, 2002 4:35 PM Subject: Re: Searching with Multiple Queries Hi Rob Here is how I think in my case I will do it but the code is not tested so it might not work: 1. Create a filter class class SearcherFilter extends Filter { protected String Directory; public SearcherFilter(String dir) { Directory = dir; } public BitSet bits(IndexReader reader) throws IOException { BitSet bits = new BitSet(reader.maxDoc()); TermDocs termDocs = reader.termDocs(); while (termDocs.next()) { int iDoc = termDocs.doc(); org.apache.lucene.document.Document doc = reader.document(iDoc); Field fldDirectory = doc.getField(Directory); String str = fldDirectory.stringValue(); if (str.startsWith(Directory)){ bits.set(iDoc); } } return bits; } } 2. Create an Anlayzer class class SearcherAnalyzer extends Analyzer { /* * An array containing some common words that * are not usually useful for searching. */ private static final String[] STOP_WORDS = { a , and , are , as , at , be , but , by , for , if , in , into, is , it , no , not , of , on , or , s , such, t , that, the , their , then, there , these , they, this, to , was , will, with }; /* * Stop table */ final static private Hashtable stopTable = StopFilter.makeStopTable(STOP_WORDS); /* * create a token stream for this analyser */ public final TokenStream tokenStream(final Reader reader) { try { TokenStream result = new StandardTokenizer(reader); result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result,stopTable); result = new PorterStemFilter(result); return result; } catch (Exception e) { return null; } } } 3. In the main code use it this way: IndexSearcher searcher =new IndexSearcher(indexLocation); Query qry = QueryParser.parse(question, body, new SearcherAnalyzer()); Hits hits = searcher.search(qry, new SearcherFilter(directory)); In your case if you do not want for example to use the LetterTokenizer() do not included in the tokenStream method of the Anlayzer. Hope this helps, Aaron - Original Message - From: Rob Outar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, November 15, 2002 4:13 PM Subject: RE: Searching with Multiple Queries For example JGuru has this: public class MyAnalyzer extends Analyzer { private static final Analyzer STANDARD = new StandardAnalyzer(); public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize field called 'element' if (element.equals(field)) { return new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; } else { // use standard analyzer return STANDARD.tokenStream(field, reader); } } } I do not want any of my fields toekenized for now, so I was thinking about use the above code with a few slight modifications... Thanks, Rob -Original Message- From: Rob Outar [mailto:routar;ideorlando.org] Sent: Friday, November 15, 2002 10:10 AM To: Lucene Users List Subject: RE: Searching
Not getting any results from query
Hello all, I am storing the field in this fashion: doc.add(new Field(releaseability, releaseability, true, true, false)); so it is indexed and stored but not tokenized. The value is Test Releaseability; I am using the query releaseability:test releaseability I am not getting any results, is my query wrong? Let me know. Thanks, Rob -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: Multiple field searches using AND and OR's
Looked at that already the format is this: public static Query parse(String query, String[] fields, Analyzer analyzer) throws ParseExceptionParses a query which searches on the fields specified. If x fields are specified, this effectively constructs: (field1:query) (field2:query) (field3:query)...(fieldx:query) my query value will not be the same. This lets u query multiple field with the same query, my query string will be different f_name = rob and l_name = outar or address = some value stuff like that. Plus there is no way of specifying OR and AND's. Thanks, Rob O -Original Message- From: Kelvin Tan [mailto:kelvin-lists;relevanz.com] Sent: Wednesday, November 13, 2002 9:42 AM To: Lucene Users List Subject: Re: Multiple field searches using AND and OR's Rob, I believe MultiFieldQueryParser will do the job for you... Regards, Kelvin On Wed, 13 Nov 2002 08:58:36 -0500, Rob Outar said: Hello all, I am wondering how I would do multiple field searches of the form: field1 = value and field2 = value2 or field2 = value3 I am thinking that each one of the above would be a term query but how would I string them together with AND's and OR's? Any help would be appreciated. Thanks, Rob PS I found this in the FAQ, but I was wondering if there was any other way to do it: My documents have multiple fields, do I have to replicate a query for each of them ? Not necessarily. A simple solution is to index the documents using a general field that contains a concatenation of the content of all the searchable fields ('author', 'title', 'body' etc). This way, a simple query will search in entire document content. The disadvantage of this method is that you cannot boost certain fields relative to others. Note also the matches in longer documents results in lower ranking. -- To unsubscribe, e-mail: mailto:lucene-user- [EMAIL PROTECTED] For additional commands, e-mail: mailto:lucene-user- [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: Several fields with the same name
Would the solution be to call Document.fields(), iterate through that enum and get my data? Thanks, Rob -Original Message- From: Rob Outar [mailto:routar;ideorlando.org] Sent: Wednesday, November 06, 2002 2:46 PM To: Lucene Users List Subject: Several fields with the same name Hello all, I have a relationship where for one key there are many values, basically a 1 to many relationship. For example with the key = name, value = bob, jim, etc.. When a client wants all the values that have been associated with the field name, how would I get that? The javadoc for Document.get(String name) states: Returns the string value of the field with the given name if any exist in this document, or null. If multiple fields may exist with this name, this method returns the last added suchadded. I don't need the last field's value, I need all values associated with that field. Any help would be appreciated. Thanks, Rob -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: Several fields with the same name
Cool, so it will keep getting the last value excluding the one it just fetched? Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:otis_gospodnetic;yahoo.com] Sent: Wednesday, November 06, 2002 2:57 PM To: Lucene Users List Subject: Re: Several fields with the same name Looking at the source if looks like you can just call it multiple times until it returns null. Otis --- Rob Outar [EMAIL PROTECTED] wrote: Hello all, I have a relationship where for one key there are many values, basically a 1 to many relationship. For example with the key = name, value = bob, jim, etc.. When a client wants all the values that have been associated with the field name, how would I get that? The javadoc for Document.get(String name) states: Returns the string value of the field with the given name if any exist in this document, or null. If multiple fields may exist with this name, this method returns the last added suchadded. I don't need the last field's value, I need all values associated with that field. Any help would be appreciated. Thanks, Rob -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/ -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Lucene and XML
Hello all, I did not know there were packages like ISOGEN that used Lucene to build a searchable index based on XML files. From visiting ISOGEN's website it looks like it is a commercial software, are there any open source extensions to Lucene that allow XML indexing and searching? Please let me know. Thanks again, Rob -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: User Base
Is Lucene GPL or Apache? Thanks, Rob -Original Message- From: Craig Walls [mailto:wallsc;michaels.com] Sent: Friday, October 25, 2002 10:32 AM To: Lucene Users List Subject: Re: User Base Absolutely--I'm very aware of how the various OS licenses work and we avoid GPL like the plague. In fact, doing a quick mental inventory of the OS stuff we've used, I believe that all of it has been under the Apache license. We tinkered with an LGPL project once, but never actually used it in production code. Robert A. Decker wrote: Your boss should be very worried about the software being brought into your projects - not because of security but because of viral licenses. The GPL is particularly heinous. The apache and FreeBSD licenses are excellent. Take a look at: http://www.oreillynet.com/lpt/a//policy/2001/12/12/transition.html http://www.apache.org/foundation/licence-FAQ.html thanks, rob http://www.robdecker.com/ http://www.planetside.com/ On Fri, 25 Oct 2002, Craig Walls wrote: Unofficially, my company is using Lucene for searching for products and projects on our web-site. By unofficially I mean that while my boss knows that we're using Lucene, my boss' boss doesn't know because he's very reluctant to buy into this open-source thing. (We've used other OS projects in our own projects as well...it's been a don't ask, don't tell kinda thing.) We launched our new search about 2 weeks ago and it rocks! In the end, we've fully met and in many cases exceeded expectations with Lucene, but they just don't know that we're using Lucene. Rob Outar wrote: All, I am trying to sell my lead on using this awesome search\indexing engine, but he wants to know the user base for this product. He wants to be assured that if we choose this products that it will not go away, and that we will have some form of support. Worst case of course is we do have the source. Anyhow if anyone can let me know what the user base is, or anything that would lead to some assurance for him, I would greatly appreciate it. Thanks, Rob -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: Error when trying to match file path
Thanks for the reply, but I am already using a standard analyzer: Analyzer analyzer =new StandardAnalyzer(); The path is being stored as a string so I do not know why I cannot find a match when I used the path as a query? Do I need to phrase the query differently? QueryParser.parse(file.getAbsolutePath(),path,this.analyzer); Let me know. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:otis_gospodnetic;yahoo.com] Sent: Wednesday, October 23, 2002 5:54 PM To: Lucene Users List Subject: Re: Error when trying to match file path http://www.jguru.com/faq/view.jsp?EID=538308 --- Rob Outar [EMAIL PROTECTED] wrote: Hi all, I am indexing the filepath with the below: Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getAbsolutePath())); I then try to run the following after building the index: this.query = QueryParser.parse(file.getAbsolutePath(),path,this.analyzer); Hits hits = this.searcher.search(this.query); It returns zero hits?!? What am I doing wrong? Any help would be appreciated. Thanks, Rob __ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/ -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: Error when trying to match file path
Some more information, with the following: this.query = QueryParser.parse(file.getAbsolutePath(),path,this.analyzer); System.out.println(this.query.toString(path)); I got: F:onesaf dev block b pair dev unittestdatafiles tools unitcomposer.xml So it looks like the Query Parser is stripping out all the \, and doing something with the F:\, would anyone happen to know why this is happening? Do I need to use a different query to get the infromation I need? Thanks, Rob -Original Message- From: Rob Outar [mailto:routar;ideorlando.org] Sent: Wednesday, October 23, 2002 5:48 PM To: [EMAIL PROTECTED] Subject: Error when trying to match file path Hi all, I am indexing the filepath with the below: Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getAbsolutePath())); I then try to run the following after building the index: this.query = QueryParser.parse(file.getAbsolutePath(),path,this.analyzer); Hits hits = this.searcher.search(this.query); It returns zero hits?!? What am I doing wrong? Any help would be appreciated. Thanks, Rob -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: Error when trying to match file path
I cannot get this to work for the life of me. I am using a Standard analyzer now. Question 1: What field type should the path be it be? doc.add(Field.UnIndexed(path, f.getAbsolutePath())); where f is a file object Question 2: What should the query be to retieve that one file? Term: Term t = new Term(path,file.getAbsolutePath()); TermQuery tQ = new TermQuery(t); System.out.println(tQ.toString(path)); Hits hits = this.searcher.search(tQ); System.out.println(hits.length() + total matching documents); or: /* QueryParser parser = new QueryParser(); System.out.println(file.getAbsolutePath()); this.query = parser.parse(file.getAbsolutePath()); System.out.println(this.query.toString(path)); Hits hits = this.searcher.search(this.query); System.out.println(hits.length() + total matching documents); Document doc = hits.doc(0); System.out.println(class = + doc.get(classification));*/ If I can get past this hurdle I will so be on my way. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:otis_gospodnetic;yahoo.com] Sent: Thursday, October 24, 2002 10:37 AM To: Lucene Users List Subject: RE: Error when trying to match file path The Analyzer is stripping your \ characters. Query Parser doesn't do that... Otis --- Rob Outar [EMAIL PROTECTED] wrote: Some more information, with the following: this.query = QueryParser.parse(file.getAbsolutePath(),path,this.analyzer); System.out.println(this.query.toString(path)); I got: F:onesaf dev block b pair dev unittestdatafiles tools unitcomposer.xml So it looks like the Query Parser is stripping out all the \, and doing something with the F:\, would anyone happen to know why this is happening? Do I need to use a different query to get the infromation I need? Thanks, Rob -Original Message- From: Rob Outar [mailto:routar;ideorlando.org] Sent: Wednesday, October 23, 2002 5:48 PM To: [EMAIL PROTECTED] Subject: Error when trying to match file path Hi all, I am indexing the filepath with the below: Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getAbsolutePath())); I then try to run the following after building the index: this.query = QueryParser.parse(file.getAbsolutePath(),path,this.analyzer); Hits hits = this.searcher.search(this.query); It returns zero hits?!? What am I doing wrong? Any help would be appreciated. Thanks, Rob -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org __ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/ -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
User Base
All, I am trying to sell my lead on using this awesome search\indexing engine, but he wants to know the user base for this product. He wants to be assured that if we choose this products that it will not go away, and that we will have some form of support. Worst case of course is we do have the source. Anyhow if anyone can let me know what the user base is, or anything that would lead to some assurance for him, I would greatly appreciate it. Thanks, Rob -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Setting fields
Hello, Is there is way to set a field once it has been associated with a document? For example if I have a field named filename, and the file is renamed I now need to update the field filename with the new name of the file. I did not see any setter methods on Field. The only solution that comes to mind is to fetch the document based on it's URL, remove it from the index, then read it with the new value. Let me know, Thanks, Rob -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: Error when trying to match file path
I finally got it to work, but I do not understand the solution, instead of storing the file path (F:\blah\blah\blah.xml), I stored the URL of file in the field called path, I was then able to use a TermQuery(path, URL) to retrieve that one document from the index. Thanks, Rob -Original Message- From: Otis Gospodnetic [mailto:otis_gospodnetic;yahoo.com] Sent: Thursday, October 24, 2002 10:37 AM To: Lucene Users List Subject: RE: Error when trying to match file path The Analyzer is stripping your \ characters. Query Parser doesn't do that... Otis --- Rob Outar [EMAIL PROTECTED] wrote: Some more information, with the following: this.query = QueryParser.parse(file.getAbsolutePath(),path,this.analyzer); System.out.println(this.query.toString(path)); I got: F:onesaf dev block b pair dev unittestdatafiles tools unitcomposer.xml So it looks like the Query Parser is stripping out all the \, and doing something with the F:\, would anyone happen to know why this is happening? Do I need to use a different query to get the infromation I need? Thanks, Rob -Original Message- From: Rob Outar [mailto:routar;ideorlando.org] Sent: Wednesday, October 23, 2002 5:48 PM To: [EMAIL PROTECTED] Subject: Error when trying to match file path Hi all, I am indexing the filepath with the below: Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getAbsolutePath())); I then try to run the following after building the index: this.query = QueryParser.parse(file.getAbsolutePath(),path,this.analyzer); Hits hits = this.searcher.search(this.query); It returns zero hits?!? What am I doing wrong? Any help would be appreciated. Thanks, Rob -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org __ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/ -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
Error when trying to match file path
Hi all, I am indexing the filepath with the below: Document doc = new Document(); doc.add(Field.UnIndexed(path, f.getAbsolutePath())); I then try to run the following after building the index: this.query = QueryParser.parse(file.getAbsolutePath(),path,this.analyzer); Hits hits = this.searcher.search(this.query); It returns zero hits?!? What am I doing wrong? Any help would be appreciated. Thanks, Rob