RE: Which searched words are found in a document
Take a look at the highlighter code, you could implement this on the front end while processing the page. Nader -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 25, 2004 10:51 AM To: [EMAIL PROTECTED] Subject: Which searched words are found in a document Hi, I have the following question: Is there an easy way to see which words from a query were found in a resulting document? So if I search for 'cat OR dog' and get a result document with only 'cat' in it. I would like to ask the searcher object or something to tell me that for the result document 'cat' was the only word found. I did see it is somehow possible with the explain method, but this does not give a clean answer. I can also get the contents of the document and do an indexof for each search term but there could be quite a lot in our case. Any suggestions? Thanks, Edvard Scheffers - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SELECTIVE Indexing
So you basically only want to index parts of your document within table Foo Bar /table tags, I'm not sure if there's an easier way, but here's what I do: 1) Parse XML files using JDOM (or any XML parser that floats your boat) into a Map or an ArrayList 2) Create a Lucene document and loop through the aforementioned structure (Map or ArrayList) adding field, value pairs to it like so contentDoc.add(new Field(fieldName,fieldValue,true,true,true) ) ; So all you would need to do is just put an if statement around the later statement to the effect of If ( fieldName.equalsIgnoreCase(table) == 0 ) { contentDoc.add(new Field(fieldName,fieldValue,true,true,true) ) ; } This may be overkill, someone feel free to correct me if I'm wrong Nader -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 19, 2004 1:01 PM To: Lucene Users List Subject: RE: SELECTIVE Indexing Hey Lucene Users My original intension for indexing was to index certain portions of HTML [ not the whole Document ], if Jtidy is not supporting this then what are my optionals Karthik -Original Message- From: Viparthi, Kiran (AFIS) [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 19, 2004 1:43 PM To: 'Lucene Users List' Subject: RE: SELECTIVE Indexing I doubt if it can be used as a plug in. Would be good to know if it can be used as a plug in. Regards, Kiran. -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: 17 May 2004 12:30 To: Lucene Users List Subject: RE: SELECTIVE Indexing Hi Can I Use TIDY [as plug in ] with Lucene ... with regards Karthik -Original Message- From: Viparthi, Kiran (AFIS) [mailto:[EMAIL PROTECTED] Sent: Monday, May 17, 2004 3:27 PM To: 'Lucene Users List' Subject: RE: SELECTIVE Indexing Try using Tidy. Creates a Document of the html and allows you to apply xpath. Hope this helps. Kiran. -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: 17 May 2004 11:59 To: Lucene Users List Subject: SELECTIVE Indexing Hi all Can Some Body tell me How to Index CERTAIN PORTION OF THE HTML FILE Only ex:- table . /table with regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Which searched words are found in a document
I looked at the highlighter code, but the query term extracter retrieves the terms from the original query. While I only want the found terms, the best way is probably to parse the result of the explain method. Edvard Take a look at the highlighter code, you could implement this on the front end while processing the page. Nader -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 25, 2004 10:51 AM To: [EMAIL PROTECTED] Subject: Which searched words are found in a document Hi, I have the following question: Is there an easy way to see which words from a query were found in a resulting document? So if I search for 'cat OR dog' and get a result document with only 'cat' in it. I would like to ask the searcher object or something to tell me that for the result document 'cat' was the only word found. I did see it is somehow possible with the explain method, but this does not give a clean answer. I can also get the contents of the document and do an indexof for each search term but there could be quite a lot in our case. Any suggestions? Thanks, Edvard Scheffers - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Memo: RE: RE: Query parser and minus signs
I switched to indexing using a text field instead of keyword, then I tried the following based on various pieces of advice: PerFieldAnalyzerWrapper pfaw = new PerFieldAnalyzerWrapper(new ChineseAnalyzer()); pfaw.addAnalyzer(language, new WhitespaceAnalyzer()); try { query = MultiFieldQueryParser.parse(queryString, new String[]{contents, keywords, title, language}, (Analyzer) pfaw); System.out.println(Parsed query: + query.toString()); } catch (ParseException e) { error = true; e.printStackTrace(); } I have tried both language:zh-HK and language:zh\-HK (which appears in the debugger as language:zh\\-HK) as the query, and neither return any hits. I've tried stepping through the code to see what is being indexed (which looks OK at least to a relative beginner like myself), and also through the search code but I'm still none the wiser. Am I doing something wrong, or have I completely missed the point ?? To:Alex BOURNE/IBEU/[EMAIL PROTECTED] cc: bcc: Subject:RE: RE: Query parser and minus signs remember luke does not display the indexed tokens but the stored field. So you would expect to see en-uk in the field. doc.add(Field.Keyword(locale,test-uk)); are you adding to the document like this? Also what analyzer you using to pass the query? org.apache.lucene.analysis.WhitespaceAnalyzer : parses as locale:en-uk org.apache.lucene.analysis.SimpleAnalyzer : parses as locale:en uk org.apache.lucene.analysis.standard.StandardAnalyzer : parses as locale:en uk Try using whitespace analyzer in Luke and see how it's interpreting the query. If you are storing as a keyword but searching with tokens, it may be your problem. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 24 May 2004 09:50 To: Lucene Users List Subject: RE: RE: Query parser and minus signs I tried this, but no it does not work. I'm concerned that escaping the minus symbol does not appear to work. The field is indexed as a keyword so is not tokenized - I've checked the contents using luke which confirms this. David Townsend [EMAIL PROTECTED] on 21 May 2004 17:02 Please respond to Lucene Users List [EMAIL PROTECTED] To:Lucene Users List [EMAIL PROTECTED] cc: bcc: Subject:RE: RE: Query parser and minus signs Doesn't en UK as a phrase query work? You're probably indexing it as a text field so it's being tokenised. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 21 May 2004 16:47 To: Lucene Users List Subject: Memo: RE: Query parser and minus signs Hmm, we may have to if there is no work around. We're not using java locales, but were trying to stick to the ISO standard which uses hyphens. Ryan Sonnek [EMAIL PROTECTED] on 21 May 2004 16:38 Please respond to Lucene Users List [EMAIL PROTECTED] To:Lucene Users List [EMAIL PROTECTED] cc: bcc: Subject:RE: Query parser and minus signs if you're dealing with locales, why not use java's built in locale syntax (ex: en_UK, zh_HK)? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 21, 2004 10:36 AM To: [EMAIL PROTECTED] Subject: Query parser and minus signs Hi All, I'm using Lucene on a site that has split content with a branch containing pages in English and a separate branch in Chinese. Some of the chinese pages include some (untranslatable) English words, so when a search is carried out in either language you can get pages from the wrong branch. To combat this we introduced a language field into the index which contains the standard language codes: en-UK and zh-HK. When you parse a query e.g. language:en\-UK you could reasonably expect the search to recover all pages with the language field set to en-UK (the minus symbol should be escaped by the backslash according to the FAQ). Unfortunately the parser seems to return en UK as the parsed query and hence returns no documents. Has anyone else had this problem, or could suggest a workaround ?? as I have yet to find a solution in the mailing archives or elsewhere. Many thanks in advance, Alex Bourne _ This transmission has been issued by a member of the HSBC Group (HSBC) for the information of the addressee only and should not be reproduced and / or distributed to any other person. Each page attached hereto must be read in conjunction with any disclaimer which forms part of it. This transmission is neither an offer nor the solicitation of an offer to sell or purchase any investment. Its contents are based on information obtained from sources believed to be reliable but HSBC makes no representation and accepts no responsibility or liability as to its completeness or accuracy.
Re: Memo: RE: RE: Query parser and minus signs
What is the value of your Parsed query: output? On May 26, 2004, at 8:39 AM, [EMAIL PROTECTED] wrote: I switched to indexing using a text field instead of keyword, then I tried the following based on various pieces of advice: PerFieldAnalyzerWrapper pfaw = new PerFieldAnalyzerWrapper(new ChineseAnalyzer()); pfaw.addAnalyzer(language, new WhitespaceAnalyzer()); try { query = MultiFieldQueryParser.parse(queryString, new String[]{contents, keywords, title, language}, (Analyzer) pfaw); System.out.println(Parsed query: + query.toString()); } catch (ParseException e) { error = true; e.printStackTrace(); } I have tried both language:zh-HK and language:zh\-HK (which appears in the debugger as language:zh\\-HK) as the query, and neither return any hits. I've tried stepping through the code to see what is being indexed (which looks OK at least to a relative beginner like myself), and also through the search code but I'm still none the wiser. Am I doing something wrong, or have I completely missed the point ?? To:Alex BOURNE/IBEU/[EMAIL PROTECTED] cc: bcc: Subject:RE: RE: Query parser and minus signs remember luke does not display the indexed tokens but the stored field. So you would expect to see en-uk in the field. doc.add(Field.Keyword(locale,test-uk)); are you adding to the document like this? Also what analyzer you using to pass the query? org.apache.lucene.analysis.WhitespaceAnalyzer : parses as locale:en-uk org.apache.lucene.analysis.SimpleAnalyzer : parses as locale:en uk org.apache.lucene.analysis.standard.StandardAnalyzer : parses as locale:en uk Try using whitespace analyzer in Luke and see how it's interpreting the query. If you are storing as a keyword but searching with tokens, it may be your problem. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 24 May 2004 09:50 To: Lucene Users List Subject: RE: RE: Query parser and minus signs I tried this, but no it does not work. I'm concerned that escaping the minus symbol does not appear to work. The field is indexed as a keyword so is not tokenized - I've checked the contents using luke which confirms this. David Townsend [EMAIL PROTECTED] on 21 May 2004 17:02 Please respond to Lucene Users List [EMAIL PROTECTED] To:Lucene Users List [EMAIL PROTECTED] cc: bcc: Subject:RE: RE: Query parser and minus signs Doesn't en UK as a phrase query work? You're probably indexing it as a text field so it's being tokenised. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 21 May 2004 16:47 To: Lucene Users List Subject: Memo: RE: Query parser and minus signs Hmm, we may have to if there is no work around. We're not using java locales, but were trying to stick to the ISO standard which uses hyphens. Ryan Sonnek [EMAIL PROTECTED] on 21 May 2004 16:38 Please respond to Lucene Users List [EMAIL PROTECTED] To:Lucene Users List [EMAIL PROTECTED] cc: bcc: Subject:RE: Query parser and minus signs if you're dealing with locales, why not use java's built in locale syntax (ex: en_UK, zh_HK)? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 21, 2004 10:36 AM To: [EMAIL PROTECTED] Subject: Query parser and minus signs Hi All, I'm using Lucene on a site that has split content with a branch containing pages in English and a separate branch in Chinese. Some of the chinese pages include some (untranslatable) English words, so when a search is carried out in either language you can get pages from the wrong branch. To combat this we introduced a language field into the index which contains the standard language codes: en-UK and zh-HK. When you parse a query e.g. language:en\-UK you could reasonably expect the search to recover all pages with the language field set to en-UK (the minus symbol should be escaped by the backslash according to the FAQ). Unfortunately the parser seems to return en UK as the parsed query and hence returns no documents. Has anyone else had this problem, or could suggest a workaround ?? as I have yet to find a solution in the mailing archives or elsewhere. Many thanks in advance, Alex Bourne _ This transmission has been issued by a member of the HSBC Group (HSBC) for the information of the addressee only and should not be reproduced and / or distributed to any other person. Each page attached hereto must be read in conjunction with any disclaimer which forms part of it. This transmission is neither an offer nor the solicitation of an offer to sell or purchase any investment. Its contents are based on information obtained from sources believed to be reliable but HSBC makes no representation and accepts no responsibility or liability as to its completeness or accuracy.
Memo: Re: RE: RE: Query parser and minus signs
Being a bit of a newbie I had tried putting -language:zh-HK by itself, where it seems it will always return no results unless you combine it with a positive term. However I then tried this and it does not seem to build the query I had hoped for: Query: hsbc Parsed query: contents:hsbc keywords:hsbc title:hsbc language:hsbc Hits: 206 Query: hsbc -language:zh-HK Parsed query: (contents:hsbc -language:zh -contents:hk) (keywords:hsbc -language:zh -keywords:hk) (title:hsbc -language:zh -title:hk) (language:hsbc -language:zh -language:HK) Hits: 169 Not quite what I was expecting from the parsed query - the zh and HK are now separated. Query: hsbc -language:zh\-HK Parsed query: (contents:hsbc -language:zh\-HK) (keywords:hsbc -language:zh\-HK) (title:hsbc -language:zh\-HK) (language:hsbc -language:zh\-HK) Hits: 206 And I'm guessing here, but I don't think the slash is escaping, does it just become part of the query?? Erik Hatcher [EMAIL PROTECTED] on 26 May 2004 15:11 Please respond to Lucene Users List [EMAIL PROTECTED] To:Lucene Users List [EMAIL PROTECTED] cc: bcc: Subject:Re: RE: RE: Query parser and minus signs What is the value of your Parsed query: output? On May 26, 2004, at 8:39 AM, [EMAIL PROTECTED] wrote: I switched to indexing using a text field instead of keyword, then I tried the following based on various pieces of advice: PerFieldAnalyzerWrapper pfaw = new PerFieldAnalyzerWrapper(new ChineseAnalyzer()); pfaw.addAnalyzer(language, new WhitespaceAnalyzer()); try { query = MultiFieldQueryParser.parse(queryString, new String[]{contents, keywords, title, language}, (Analyzer) pfaw); System.out.println(Parsed query: + query.toString()); } catch (ParseException e) { error = true; e.printStackTrace(); } I have tried both language:zh-HK and language:zh\-HK (which appears in the debugger as language:zh\\-HK) as the query, and neither return any hits. I've tried stepping through the code to see what is being indexed (which looks OK at least to a relative beginner like myself), and also through the search code but I'm still none the wiser. Am I doing something wrong, or have I completely missed the point ?? To:Alex BOURNE/IBEU/[EMAIL PROTECTED] cc: bcc: Subject:RE: RE: Query parser and minus signs remember luke does not display the indexed tokens but the stored field. So you would expect to see en-uk in the field. doc.add(Field.Keyword(locale,test-uk)); are you adding to the document like this? Also what analyzer you using to pass the query? org.apache.lucene.analysis.WhitespaceAnalyzer : parses as locale:en-uk org.apache.lucene.analysis.SimpleAnalyzer : parses as locale:en uk org.apache.lucene.analysis.standard.StandardAnalyzer : parses as locale:en uk Try using whitespace analyzer in Luke and see how it's interpreting the query. If you are storing as a keyword but searching with tokens, it may be your problem. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 24 May 2004 09:50 To: Lucene Users List Subject: RE: RE: Query parser and minus signs I tried this, but no it does not work. I'm concerned that escaping the minus symbol does not appear to work. The field is indexed as a keyword so is not tokenized - I've checked the contents using luke which confirms this. David Townsend [EMAIL PROTECTED] on 21 May 2004 17:02 Please respond to Lucene Users List [EMAIL PROTECTED] To:Lucene Users List [EMAIL PROTECTED] cc: bcc: Subject:RE: RE: Query parser and minus signs Doesn't en UK as a phrase query work? You're probably indexing it as a text field so it's being tokenised. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 21 May 2004 16:47 To: Lucene Users List Subject: Memo: RE: Query parser and minus signs Hmm, we may have to if there is no work around. We're not using java locales, but were trying to stick to the ISO standard which uses hyphens. Ryan Sonnek [EMAIL PROTECTED] on 21 May 2004 16:38 Please respond to Lucene Users List [EMAIL PROTECTED] To:Lucene Users List [EMAIL PROTECTED] cc: bcc: Subject:RE: Query parser and minus signs if you're dealing with locales, why not use java's built in locale syntax (ex: en_UK, zh_HK)? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 21, 2004 10:36 AM To: [EMAIL PROTECTED] Subject: Query parser and minus signs Hi All, I'm using Lucene on a site that has split content with a branch containing pages in English and a separate branch in Chinese. Some of the chinese pages include some (untranslatable) English words, so when a search is carried out in either language you can get pages from the
Re: Memo: Re: RE: RE: Query parser and minus signs
On May 26, 2004, at 10:48 AM, [EMAIL PROTECTED] wrote: Query: hsbc -language:zh-HK Parsed query: (contents:hsbc -language:zh -contents:hk) (keywords:hsbc -language:zh -keywords:hk) (title:hsbc -language:zh -title:hk) (language:hsbc -language:zh -language:HK) Hits: 169 Not quite what I was expecting from the parsed query - the zh and HK are now separated. I think I can safely say that you are not running the latest version of Lucene. This has been corrected in the 1.4 versions. I've tested this with Wal-Mart (without the quote) and QueryParser, and it works as expected. Query: hsbc -language:zh\-HK Parsed query: (contents:hsbc -language:zh\-HK) (keywords:hsbc -language:zh\-HK) (title:hsbc -language:zh\-HK) (language:hsbc -language:zh\-HK) Hits: 206 And I'm guessing here, but I don't think the slash is escaping, does it just become part of the query?? Now that is odd. QueryParser is an awkward beast at times, and combining it with MultiFieldQueryParser (which I'd recommend against, as you can see with the odd queries it built for you) gets even more confusing. Hopefully the latest Lucene 1.4 RC release will fix up your situation. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Asian languages
Which asian languages are supported by Lucene ? What about corean, japanese, thaï, ... ? If they are not yet supported, what I need to do ? Thanks, Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Memory usage
Hello, I was wondering if anyone has had problems with memory usage and MultiSearcher. My index is composed of two sub-indexes that I search with a MultiSearcher. The total size of the index is about 3.7GB with the larger sub-index being 3.6GB and the smaller being 117MB. I am using Lucene 1.3 Final with the compound file format. Also I search across about 50 fields but I don't use wildcard or range queries. Doing repeated searches in this way seems to eventually chew up about 500MB of memory which seems excessive to me. Does anyone have any ideas where I could look to reduce the memory my queries consume? Thanks, Jim __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Memory usage
This sounds like a memory leakage situation. If you are using tomcat I would suggest you make sure you are on a recent version, as it is known to have some memory leaks in version 4. It doesn't make sense that repeated queries would use more memory that the most demanding query unless objects are not getting freed from memory. -Will -Original Message- From: James Dunn [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 26, 2004 3:02 PM To: [EMAIL PROTECTED] Subject: Memory usage Hello, I was wondering if anyone has had problems with memory usage and MultiSearcher. My index is composed of two sub-indexes that I search with a MultiSearcher. The total size of the index is about 3.7GB with the larger sub-index being 3.6GB and the smaller being 117MB. I am using Lucene 1.3 Final with the compound file format. Also I search across about 50 fields but I don't use wildcard or range queries. Doing repeated searches in this way seems to eventually chew up about 500MB of memory which seems excessive to me. Does anyone have any ideas where I could look to reduce the memory my queries consume? Thanks, Jim __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Problem Indexing Large Document Field
I am trying to index a field in a Lucene document with about 90,000 characters. The problem is that it only indexes part of the document. It seems to only index about 65,00 characters. So, if I search on terms that are at the beginning of the text, the search works, but it fails for terms that are at the end of the document. Is there a limitation on how many characters can be stored in a document field? Any help would be appreciated, thanks Gilberto Rodriguez Software Engineer 370 CenterPointe Circle, Suite 1178 Altamonte Springs, FL 32701-3451 407.339.1177 (Ext.112) phone 407.339.6704 fax [EMAIL PROTECTED] email www.conviveon.com web This e-mail contains legally privileged and confidential information intended only for the individual or entity named within the message. If the reader of this message is not the intended recipient, or the agent responsible to deliver it to the intended recipient, the recipient is hereby notified that any review, dissemination, distribution or copying of this communication is prohibited. If this communication was received in error, please notify me by reply e-mail and delete the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Memory usage
Will, Thanks for your response. It may be an object leak. I will look into that. I just ran some more tests and this time I create a 20GB index by repeatedly merging my large index into itself. When I ran my test query against that index I got an OutOfMemoryError on the very first query. I have my heap set to 512MB. Should a query against a 20GB index require that much memory? I page through the results 100 at a time, so I should never have more than 100 Document objects in memory. Any help would be appreciated, thanks! Jim --- [EMAIL PROTECTED] wrote: This sounds like a memory leakage situation. If you are using tomcat I would suggest you make sure you are on a recent version, as it is known to have some memory leaks in version 4. It doesn't make sense that repeated queries would use more memory that the most demanding query unless objects are not getting freed from memory. -Will -Original Message- From: James Dunn [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 26, 2004 3:02 PM To: [EMAIL PROTECTED] Subject: Memory usage Hello, I was wondering if anyone has had problems with memory usage and MultiSearcher. My index is composed of two sub-indexes that I search with a MultiSearcher. The total size of the index is about 3.7GB with the larger sub-index being 3.6GB and the smaller being 117MB. I am using Lucene 1.3 Final with the compound file format. Also I search across about 50 fields but I don't use wildcard or range queries. Doing repeated searches in this way seems to eventually chew up about 500MB of memory which seems excessive to me. Does anyone have any ideas where I could look to reduce the memory my queries consume? Thanks, Jim __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problem Indexing Large Document Field
Gilberto, Look at the IndexWriter class. It has a property, maxFieldLength, which you can set to determine the max number of characters to be stored in the index. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html Jim --- Gilberto Rodriguez [EMAIL PROTECTED] wrote: I am trying to index a field in a Lucene document with about 90,000 characters. The problem is that it only indexes part of the document. It seems to only index about 65,00 characters. So, if I search on terms that are at the beginning of the text, the search works, but it fails for terms that are at the end of the document. Is there a limitation on how many characters can be stored in a document field? Any help would be appreciated, thanks Gilberto Rodriguez Software Engineer 370 CenterPointe Circle, Suite 1178 Altamonte Springs, FL 32701-3451 407.339.1177 (Ext.112) phone 407.339.6704 fax [EMAIL PROTECTED] email www.conviveon.com web This e-mail contains legally privileged and confidential information intended only for the individual or entity named within the message. If the reader of this message is not the intended recipient, or the agent responsible to deliver it to the intended recipient, the recipient is hereby notified that any review, dissemination, distribution or copying of this communication is prohibited. If this communication was received in error, please notify me by reply e-mail and delete the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problem Indexing Large Document Field
Thanks, James... That solved the problem. On May 26, 2004, at 4:15 PM, James Dunn wrote: Gilberto, Look at the IndexWriter class. It has a property, maxFieldLength, which you can set to determine the max number of characters to be stored in the index. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ IndexWriter.html Jim --- Gilberto Rodriguez [EMAIL PROTECTED] wrote: I am trying to index a field in a Lucene document with about 90,000 characters. The problem is that it only indexes part of the document. It seems to only index about 65,00 characters. So, if I search on terms that are at the beginning of the text, the search works, but it fails for terms that are at the end of the document. Is there a limitation on how many characters can be stored in a document field? Any help would be appreciated, thanks Gilberto Rodriguez Software Engineer 370 CenterPointe Circle, Suite 1178 Altamonte Springs, FL 32701-3451 407.339.1177 (Ext.112) phone 407.339.6704 fax [EMAIL PROTECTED] email www.conviveon.com web This e-mail contains legally privileged and confidential information intended only for the individual or entity named within the message. If the reader of this message is not the intended recipient, or the agent responsible to deliver it to the intended recipient, the recipient is hereby notified that any review, dissemination, distribution or copying of this communication is prohibited. If this communication was received in error, please notify me by reply e-mail and delete the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Gilberto Rodriguez Software Engineer 370 CenterPointe Circle, Suite 1178 Altamonte Springs, FL 32701-3451 407.339.1177 (Ext.112) phone 407.339.6704 fax [EMAIL PROTECTED] email www.conviveon.com web This e-mail contains legally privileged and confidential information intended only for the individual or entity named within the message. If the reader of this message is not the intended recipient, or the agent responsible to deliver it to the intended recipient, the recipient is hereby notified that any review, dissemination, distribution or copying of this communication is prohibited. If this communication was received in error, please notify me by reply e-mail and delete the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Memory usage
How big are your actual Documents? Are you caching Hits? It stores, internally, up to 200 documents. Erik On May 26, 2004, at 4:08 PM, James Dunn wrote: Will, Thanks for your response. It may be an object leak. I will look into that. I just ran some more tests and this time I create a 20GB index by repeatedly merging my large index into itself. When I ran my test query against that index I got an OutOfMemoryError on the very first query. I have my heap set to 512MB. Should a query against a 20GB index require that much memory? I page through the results 100 at a time, so I should never have more than 100 Document objects in memory. Any help would be appreciated, thanks! Jim --- [EMAIL PROTECTED] wrote: This sounds like a memory leakage situation. If you are using tomcat I would suggest you make sure you are on a recent version, as it is known to have some memory leaks in version 4. It doesn't make sense that repeated queries would use more memory that the most demanding query unless objects are not getting freed from memory. -Will -Original Message- From: James Dunn [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 26, 2004 3:02 PM To: [EMAIL PROTECTED] Subject: Memory usage Hello, I was wondering if anyone has had problems with memory usage and MultiSearcher. My index is composed of two sub-indexes that I search with a MultiSearcher. The total size of the index is about 3.7GB with the larger sub-index being 3.6GB and the smaller being 117MB. I am using Lucene 1.3 Final with the compound file format. Also I search across about 50 fields but I don't use wildcard or range queries. Doing repeated searches in this way seems to eventually chew up about 500MB of memory which seems excessive to me. Does anyone have any ideas where I could look to reduce the memory my queries consume? Thanks, Jim __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Memory usage
James Dunn wrote: Also I search across about 50 fields but I don't use wildcard or range queries. Lucene uses one byte of RAM per document per searched field, to hold the normalization values. So if you search a 10M document collection with 50 fields, then you'll end up using 500MB of RAM. If you're using unanalyzed fields, then an easy workaround to reduce the number of fields is to combine many in a single field. So, instead of, e.g., using an f1 field with value abc, and an f2 field with value efg, use a single field named f with values 1_abc and 2_efg. We could optimize this in Lucene. If no values of an indexed field are analyzed, then we could store no norms for the field and hence read none into memory. This wouldn't be too hard to implement... Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Memory usage
Erik, Thanks for the response. My actual documents are fairly small. Most docs only have about 10 fields. Some of those fields are stored, however, like the OBJECT_ID, NAME and DESC fields. The stored fields are pretty small as well. None should be more than 4KB and very few will approach that limit. I'm also using the default maxFieldSize value of 1. I'm not caching hits, either. Could it be my query? I have about 80 total unique fields in the index although no document has all 80. My query ends up looking like this: +(F1:test F2:test .. F80:test) From previous mails that doesn't look like an enormous amount of fields to be searching against. Is there some formula for the amount of memory required for a query based on the number of clauses and terms? Jim --- Erik Hatcher [EMAIL PROTECTED] wrote: How big are your actual Documents? Are you caching Hits? It stores, internally, up to 200 documents. Erik On May 26, 2004, at 4:08 PM, James Dunn wrote: Will, Thanks for your response. It may be an object leak. I will look into that. I just ran some more tests and this time I create a 20GB index by repeatedly merging my large index into itself. When I ran my test query against that index I got an OutOfMemoryError on the very first query. I have my heap set to 512MB. Should a query against a 20GB index require that much memory? I page through the results 100 at a time, so I should never have more than 100 Document objects in memory. Any help would be appreciated, thanks! Jim --- [EMAIL PROTECTED] wrote: This sounds like a memory leakage situation. If you are using tomcat I would suggest you make sure you are on a recent version, as it is known to have some memory leaks in version 4. It doesn't make sense that repeated queries would use more memory that the most demanding query unless objects are not getting freed from memory. -Will -Original Message- From: James Dunn [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 26, 2004 3:02 PM To: [EMAIL PROTECTED] Subject: Memory usage Hello, I was wondering if anyone has had problems with memory usage and MultiSearcher. My index is composed of two sub-indexes that I search with a MultiSearcher. The total size of the index is about 3.7GB with the larger sub-index being 3.6GB and the smaller being 117MB. I am using Lucene 1.3 Final with the compound file format. Also I search across about 50 fields but I don't use wildcard or range queries. Doing repeated searches in this way seems to eventually chew up about 500MB of memory which seems excessive to me. Does anyone have any ideas where I could look to reduce the memory my queries consume? Thanks, Jim __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
classic scenario
I salute the Lucene community! it will be a great help for me if I get your valuable opinions on the following issue; I know I could've find more answers to my questions from reading the documentation but I did invest some time on this and still have these questions: I am (also) building a web crawler, a topic specific one to be more precise, for a vortal. I recently learned about Lucene and I'd very much like to use it in order to handle keyword specific searched on the info that I collect. I suspect this is a classic project, at least for Lucene, probably something like this has been addressed already on this disussion list, I'm interested to hear any experience anyone might have with this subject. My crawler goes on the internet, extracts/parse/ranks and saves websites, most of the information is also categoriezed and stored in the database but I also save about 10 top pages from each site in the filesystem. The first question is: should I care about indexing these files at the time I extract them from internet? Or should I index them later, when I make them available for search? If yes, then can I still name my files the way I want?(i.e. are there any constraints in the filenames from Lucene perspective?) Is it an OK idea to have the same files repository (or index) where the crawler writes (indexes files) and the search function searches? I guess performance issues are important here. Can I still organize the files that I save the way I want? (I planned to write all the files from a given website on different folders...and the folders will have as name the id from my database) I maintain a taxonomy (list of categories)...each website will fall into one or more of these categories, also each website will have a rank. Does Lucene have something that I should be aware of related to what I said? I guess that's it for now...this is more like a pet project for me, a pet which keeps growing :) I wouldn't mind any help and opinions you can provide, source code samples, etc. Big thanks in advance and good luck on your work. adrian. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Memory usage
Doug, Thanks! I just asked a question regarding how to calculate the memory requirements for a search. Does this memory only get used only during the search operation itself, or is it referenced by the Hits object or anything else after the actual search completes? Thanks again, Jim --- Doug Cutting [EMAIL PROTECTED] wrote: James Dunn wrote: Also I search across about 50 fields but I don't use wildcard or range queries. Lucene uses one byte of RAM per document per searched field, to hold the normalization values. So if you search a 10M document collection with 50 fields, then you'll end up using 500MB of RAM. If you're using unanalyzed fields, then an easy workaround to reduce the number of fields is to combine many in a single field. So, instead of, e.g., using an f1 field with value abc, and an f2 field with value efg, use a single field named f with values 1_abc and 2_efg. We could optimize this in Lucene. If no values of an indexed field are analyzed, then we could store no norms for the field and hence read none into memory. This wouldn't be too hard to implement... Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Memory usage
It is cached by the IndexReader and lives until the index reader is garbage collected. 50-70 searchable fields is a *lot*. How many are analyzed text, and how many are simply keywords? Doug James Dunn wrote: Doug, Thanks! I just asked a question regarding how to calculate the memory requirements for a search. Does this memory only get used only during the search operation itself, or is it referenced by the Hits object or anything else after the actual search completes? Thanks again, Jim --- Doug Cutting [EMAIL PROTECTED] wrote: James Dunn wrote: Also I search across about 50 fields but I don't use wildcard or range queries. Lucene uses one byte of RAM per document per searched field, to hold the normalization values. So if you search a 10M document collection with 50 fields, then you'll end up using 500MB of RAM. If you're using unanalyzed fields, then an easy workaround to reduce the number of fields is to combine many in a single field. So, instead of, e.g., using an f1 field with value abc, and an f2 field with value efg, use a single field named f with values 1_abc and 2_efg. We could optimize this in Lucene. If no values of an indexed field are analyzed, then we could store no norms for the field and hence read none into memory. This wouldn't be too hard to implement... Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Problem Indexing Large Document Field
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWrite r.html#DEFAULT_MAX_FIELD_LENGTH maxFieldLength public int maxFieldLengthThe maximum number of terms that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory. Note that this effectively truncates large documents, excluding from the index terms that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accomodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError. By default, no more than 10,000 terms will be indexed for a field. -Original Message- From: Gilberto Rodriguez [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 26, 2004 4:04 PM To: [EMAIL PROTECTED] Subject: Problem Indexing Large Document Field I am trying to index a field in a Lucene document with about 90,000 characters. The problem is that it only indexes part of the document. It seems to only index about 65,00 characters. So, if I search on terms that are at the beginning of the text, the search works, but it fails for terms that are at the end of the document. Is there a limitation on how many characters can be stored in a document field? Any help would be appreciated, thanks Gilberto Rodriguez Software Engineer 370 CenterPointe Circle, Suite 1178 Altamonte Springs, FL 32701-3451 407.339.1177 (Ext.112) phone 407.339.6704 fax [EMAIL PROTECTED] email www.conviveon.com web This e-mail contains legally privileged and confidential information intended only for the individual or entity named within the message. If the reader of this message is not the intended recipient, or the agent responsible to deliver it to the intended recipient, the recipient is hereby notified that any review, dissemination, distribution or copying of this communication is prohibited. If this communication was received in error, please notify me by reply e-mail and delete the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problem Indexing Large Document Field
Yeap, that was the problem... I just needed to increase the maxFieldLength number. Thanks... On May 26, 2004, at 5:56 PM, [EMAIL PROTECTED] wrote: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ IndexWrite r.html#DEFAULT_MAX_FIELD_LENGTH maxFieldLength public int maxFieldLengthThe maximum number of terms that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory. Note that this effectively truncates large documents, excluding from the index terms that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accomodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError. By default, no more than 10,000 terms will be indexed for a field. -Original Message- From: Gilberto Rodriguez [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 26, 2004 4:04 PM To: [EMAIL PROTECTED] Subject: Problem Indexing Large Document Field I am trying to index a field in a Lucene document with about 90,000 characters. The problem is that it only indexes part of the document. It seems to only index about 65,00 characters. So, if I search on terms that are at the beginning of the text, the search works, but it fails for terms that are at the end of the document. Is there a limitation on how many characters can be stored in a document field? Any help would be appreciated, thanks Gilberto Rodriguez Software Engineer 370 CenterPointe Circle, Suite 1178 Altamonte Springs, FL 32701-3451 407.339.1177 (Ext.112) phone 407.339.6704 fax [EMAIL PROTECTED] email www.conviveon.com web This e-mail contains legally privileged and confidential information intended only for the individual or entity named within the message. If the reader of this message is not the intended recipient, or the agent responsible to deliver it to the intended recipient, the recipient is hereby notified that any review, dissemination, distribution or copying of this communication is prohibited. If this communication was received in error, please notify me by reply e-mail and delete the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Gilberto Rodriguez Software Engineer 370 CenterPointe Circle, Suite 1178 Altamonte Springs, FL 32701-3451 407.339.1177 (Ext.112) phone 407.339.6704 fax [EMAIL PROTECTED] email www.conviveon.com web This e-mail contains legally privileged and confidential information intended only for the individual or entity named within the message. If the reader of this message is not the intended recipient, or the agent responsible to deliver it to the intended recipient, the recipient is hereby notified that any review, dissemination, distribution or copying of this communication is prohibited. If this communication was received in error, please notify me by reply e-mail and delete the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Number query not working
Hi, I have a bunch of digits in a field. When I do this search it returns nothing: myField:001085609805100 It returns the correct document when I add a * to the end like this: myField:001085609805100* -- added the * I'm not sure what is happening here. I'm thinking that Lucene is doing some number conversion internally when it sees only digits. When I add the * maybe it presumes it is still a string. How do I get a string of digits to work without adding a *? Thanks, Reece - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Number query not working
Hi, It looks like its because I'm using the SimpleAnalyzer instead of the StandardAnalyzer. What is the SimpleAnalyzer to this query to make it not work? Thanks, Reece --- Lucene Users List [EMAIL PROTECTED] wrote: Hi, I have a bunch of digits in a field. When I do this search it returns nothing: myField:001085609805100 It returns the correct document when I add a * to the end like this: myField:001085609805100* -- added the * I'm not sure what is happening here. I'm thinking that Lucene is doing some number conversion internally when it sees only digits. When I add the * maybe it presumes it is still a string. How do I get a string of digits to work without adding a *? Thanks, Reece - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Memory usage
Doug, We only search on analyzed text fields. There are a couple of additional fields in the index like OBJECT_ID that are keywords but we don't search against those, we only use them once we get a result back to find the thing that document represents. Thanks, Jim --- Doug Cutting [EMAIL PROTECTED] wrote: It is cached by the IndexReader and lives until the index reader is garbage collected. 50-70 searchable fields is a *lot*. How many are analyzed text, and how many are simply keywords? Doug James Dunn wrote: Doug, Thanks! I just asked a question regarding how to calculate the memory requirements for a search. Does this memory only get used only during the search operation itself, or is it referenced by the Hits object or anything else after the actual search completes? Thanks again, Jim --- Doug Cutting [EMAIL PROTECTED] wrote: James Dunn wrote: Also I search across about 50 fields but I don't use wildcard or range queries. Lucene uses one byte of RAM per document per searched field, to hold the normalization values. So if you search a 10M document collection with 50 fields, then you'll end up using 500MB of RAM. If you're using unanalyzed fields, then an easy workaround to reduce the number of fields is to combine many in a single field. So, instead of, e.g., using an f1 field with value abc, and an f2 field with value efg, use a single field named f with values 1_abc and 2_efg. We could optimize this in Lucene. If no values of an indexed field are analyzed, then we could store no norms for the field and hence read none into memory. This wouldn't be too hard to implement... Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Number query not working
Whoa! I reread my last post and the last sentence didn't make much sense. This is what I meant to say: What is the SimpleAnalyzer doing to this query to make it not work? --- Lucene Users List [EMAIL PROTECTED] wrote: Hi, It looks like its because I'm using the SimpleAnalyzer instead of the StandardAnalyzer. What is the SimpleAnalyzer to this query to make it not work? Thanks, Reece --- Lucene Users List [EMAIL PROTECTED] wrote: Hi, I have a bunch of digits in a field. When I do this search it returns nothing: myField:001085609805100 It returns the correct document when I add a * to the end like this: myField:001085609805100* -- added the * I'm not sure what is happening here. I'm thinking that Lucene is doing some number conversion internally when it sees only digits. When I add the * maybe it presumes it is still a string. How do I get a string of digits to work without adding a *? Thanks, Reece - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Number query not working
On May 26, 2004, at 6:38 PM, [EMAIL PROTECTED] wrote: It looks like its because I'm using the SimpleAnalyzer instead of the StandardAnalyzer. What is the SimpleAnalyzer to this query to make it not work? http://wiki.apache.org/jakarta-lucene/AnalysisParalysis It is a good idea to analyze the analyzer. Do a .toString output of the Query and you'll see clearly what happened. Erik Thanks, Reece --- Lucene Users List [EMAIL PROTECTED] wrote: Hi, I have a bunch of digits in a field. When I do this search it returns nothing: myField:001085609805100 It returns the correct document when I add a * to the end like this: myField:001085609805100* -- added the * I'm not sure what is happening here. I'm thinking that Lucene is doing some number conversion internally when it sees only digits. When I add the * maybe it presumes it is still a string. How do I get a string of digits to work without adding a *? Thanks, Reece - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Asian languages
CJKAnalyzer suports chinese , japanese and korean languages , Im not sure about the thai . i got a CJKAnalyzer from lucene sandbox - Original Message - From: Christophe Lombart [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, May 27, 2004 12:01 AM Subject: Asian languages Which asian languages are supported by Lucene ? What about corean, japanese, thaï, ... ? If they are not yet supported, what I need to do ? Thanks, Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Range Query Sombody HELP please
Hi Lucene developers Is it possible to do Search and retrieve relevant information on the Indexed Document within in specific range settings which may be similar to an Query in SQL = select * from BOOKSHELF where book1 between 100 and 200 ex:- search_word , Book between 100 AND 200 [ Note:- where Book uniquefield hit info which is already Indexed ] Sombody Please Help me :( with regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]