RE: Which searched words are found in a document

2004-05-26 Thread Nader S. Henein
Take a look at the highlighter code, you could implement this on the front end while processing the page. Nader -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 25, 2004 10:51 AM To: [EMAIL PROTECTED] Subject: Which searched words are found in a

RE: SELECTIVE Indexing

2004-05-26 Thread Nader S. Henein
So you basically only want to index parts of your document within table Foo Bar /table tags, I'm not sure if there's an easier way, but here's what I do: 1) Parse XML files using JDOM (or any XML parser that floats your boat) into a Map or an ArrayList 2) Create a Lucene document and loop

RE: Which searched words are found in a document

2004-05-26 Thread Edvard Scheffers
I looked at the highlighter code, but the query term extracter retrieves the terms from the original query. While I only want the found terms, the best way is probably to parse the result of the explain method. Edvard Take a look at the highlighter code, you could implement this on the front

Memo: RE: RE: Query parser and minus signs

2004-05-26 Thread alex . bourne
I switched to indexing using a text field instead of keyword, then I tried the following based on various pieces of advice: PerFieldAnalyzerWrapper pfaw = new PerFieldAnalyzerWrapper(new ChineseAnalyzer()); pfaw.addAnalyzer(language, new WhitespaceAnalyzer());

Re: Memo: RE: RE: Query parser and minus signs

2004-05-26 Thread Erik Hatcher
What is the value of your Parsed query: output? On May 26, 2004, at 8:39 AM, [EMAIL PROTECTED] wrote: I switched to indexing using a text field instead of keyword, then I tried the following based on various pieces of advice: PerFieldAnalyzerWrapper pfaw = new

Memo: Re: RE: RE: Query parser and minus signs

2004-05-26 Thread alex . bourne
Being a bit of a newbie I had tried putting -language:zh-HK by itself, where it seems it will always return no results unless you combine it with a positive term. However I then tried this and it does not seem to build the query I had hoped for: Query: hsbc Parsed query: contents:hsbc

Re: Memo: Re: RE: RE: Query parser and minus signs

2004-05-26 Thread Erik Hatcher
On May 26, 2004, at 10:48 AM, [EMAIL PROTECTED] wrote: Query: hsbc -language:zh-HK Parsed query: (contents:hsbc -language:zh -contents:hk) (keywords:hsbc -language:zh -keywords:hk) (title:hsbc -language:zh -title:hk) (language:hsbc -language:zh -language:HK) Hits: 169 Not quite what I was

Asian languages

2004-05-26 Thread Christophe Lombart
Which asian languages are supported by Lucene ? What about corean, japanese, thaï, ... ? If they are not yet supported, what I need to do ? Thanks, Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands,

Memory usage

2004-05-26 Thread James Dunn
Hello, I was wondering if anyone has had problems with memory usage and MultiSearcher. My index is composed of two sub-indexes that I search with a MultiSearcher. The total size of the index is about 3.7GB with the larger sub-index being 3.6GB and the smaller being 117MB. I am using Lucene 1.3

RE: Memory usage

2004-05-26 Thread wallen
This sounds like a memory leakage situation. If you are using tomcat I would suggest you make sure you are on a recent version, as it is known to have some memory leaks in version 4. It doesn't make sense that repeated queries would use more memory that the most demanding query unless objects

Problem Indexing Large Document Field

2004-05-26 Thread Gilberto Rodriguez
I am trying to index a field in a Lucene document with about 90,000 characters. The problem is that it only indexes part of the document. It seems to only index about 65,00 characters. So, if I search on terms that are at the beginning of the text, the search works, but it fails for terms that

RE: Memory usage

2004-05-26 Thread James Dunn
Will, Thanks for your response. It may be an object leak. I will look into that. I just ran some more tests and this time I create a 20GB index by repeatedly merging my large index into itself. When I ran my test query against that index I got an OutOfMemoryError on the very first query. I

Re: Problem Indexing Large Document Field

2004-05-26 Thread James Dunn
Gilberto, Look at the IndexWriter class. It has a property, maxFieldLength, which you can set to determine the max number of characters to be stored in the index. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html Jim --- Gilberto Rodriguez [EMAIL PROTECTED]

Re: Problem Indexing Large Document Field

2004-05-26 Thread Gilberto Rodriguez
Thanks, James... That solved the problem. On May 26, 2004, at 4:15 PM, James Dunn wrote: Gilberto, Look at the IndexWriter class. It has a property, maxFieldLength, which you can set to determine the max number of characters to be stored in the index.

Re: Memory usage

2004-05-26 Thread Erik Hatcher
How big are your actual Documents? Are you caching Hits? It stores, internally, up to 200 documents. Erik On May 26, 2004, at 4:08 PM, James Dunn wrote: Will, Thanks for your response. It may be an object leak. I will look into that. I just ran some more tests and this time I create a

Re: Memory usage

2004-05-26 Thread Doug Cutting
James Dunn wrote: Also I search across about 50 fields but I don't use wildcard or range queries. Lucene uses one byte of RAM per document per searched field, to hold the normalization values. So if you search a 10M document collection with 50 fields, then you'll end up using 500MB of RAM. If

Re: Memory usage

2004-05-26 Thread James Dunn
Erik, Thanks for the response. My actual documents are fairly small. Most docs only have about 10 fields. Some of those fields are stored, however, like the OBJECT_ID, NAME and DESC fields. The stored fields are pretty small as well. None should be more than 4KB and very few will approach

classic scenario

2004-05-26 Thread Adrian Dumitru
I salute the Lucene community! it will be a great help for me if I get your valuable opinions on the following issue; I know I could've find more answers to my questions from reading the documentation but I did invest some time on this and still have these questions: I am (also) building a web

Re: Memory usage

2004-05-26 Thread James Dunn
Doug, Thanks! I just asked a question regarding how to calculate the memory requirements for a search. Does this memory only get used only during the search operation itself, or is it referenced by the Hits object or anything else after the actual search completes? Thanks again, Jim ---

Re: Memory usage

2004-05-26 Thread Doug Cutting
It is cached by the IndexReader and lives until the index reader is garbage collected. 50-70 searchable fields is a *lot*. How many are analyzed text, and how many are simply keywords? Doug James Dunn wrote: Doug, Thanks! I just asked a question regarding how to calculate the memory

RE: Problem Indexing Large Document Field

2004-05-26 Thread wallen
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWrite r.html#DEFAULT_MAX_FIELD_LENGTH maxFieldLength public int maxFieldLengthThe maximum number of terms that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that

Re: Problem Indexing Large Document Field

2004-05-26 Thread Gilberto Rodriguez
Yeap, that was the problem... I just needed to increase the maxFieldLength number. Thanks... On May 26, 2004, at 5:56 PM, [EMAIL PROTECTED] wrote: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ IndexWrite r.html#DEFAULT_MAX_FIELD_LENGTH maxFieldLength public int

Number query not working

2004-05-26 Thread Reece . 1247688
Hi, I have a bunch of digits in a field. When I do this search it returns nothing: myField:001085609805100 It returns the correct document when I add a * to the end like this: myField:001085609805100* -- added the * I'm not sure what is happening here. I'm thinking

Re: Number query not working

2004-05-26 Thread Reece . 1247688
Hi, It looks like its because I'm using the SimpleAnalyzer instead of the StandardAnalyzer. What is the SimpleAnalyzer to this query to make it not work? Thanks, Reece --- Lucene Users List [EMAIL PROTECTED] wrote: Hi, I have a bunch of digits in a field. When I do this search it

Re: Memory usage

2004-05-26 Thread James Dunn
Doug, We only search on analyzed text fields. There are a couple of additional fields in the index like OBJECT_ID that are keywords but we don't search against those, we only use them once we get a result back to find the thing that document represents. Thanks, Jim --- Doug Cutting [EMAIL

Re: Number query not working

2004-05-26 Thread Reece . 1247688
Whoa! I reread my last post and the last sentence didn't make much sense. This is what I meant to say: What is the SimpleAnalyzer doing to this query to make it not work? --- Lucene Users List [EMAIL PROTECTED] wrote: Hi, It looks like its because I'm using the SimpleAnalyzer instead

Re: Number query not working

2004-05-26 Thread Erik Hatcher
On May 26, 2004, at 6:38 PM, [EMAIL PROTECTED] wrote: It looks like its because I'm using the SimpleAnalyzer instead of the StandardAnalyzer. What is the SimpleAnalyzer to this query to make it not work? http://wiki.apache.org/jakarta-lucene/AnalysisParalysis It is a good idea to analyze the

Re: Asian languages

2004-05-26 Thread Chandan Tamrakar
CJKAnalyzer suports chinese , japanese and korean languages , Im not sure about the thai . i got a CJKAnalyzer from lucene sandbox - Original Message - From: Christophe Lombart [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, May 27, 2004 12:01 AM Subject: Asian

Range Query Sombody HELP please

2004-05-26 Thread Karthik N S
Hi Lucene developers Is it possible to do Search and retrieve relevant information on the Indexed Document within in specific range settings which may be similar to an Query in SQL = select * from BOOKSHELF where book1 between 100 and 200 ex:- search_word , Book between 100