Take a look at the highlighter code, you could implement this on the front
end while processing the page.
Nader
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 25, 2004 10:51 AM
To: [EMAIL PROTECTED]
Subject: Which searched words are found in a
So you basically only want to index parts of your document within table
Foo Bar /table tags,
I'm not sure if there's an easier way, but here's what I do:
1) Parse XML files using JDOM (or any XML parser that floats your boat)
into a Map or an ArrayList
2) Create a Lucene document and loop
I looked at the highlighter code, but the query term extracter retrieves
the terms from the original query. While I only want the found terms, the
best way is probably to parse the result of the explain method.
Edvard
Take a look at the highlighter code, you could implement this on the front
I switched to indexing using a text field instead of keyword, then I tried
the following based on various pieces of advice:
PerFieldAnalyzerWrapper pfaw = new PerFieldAnalyzerWrapper(new
ChineseAnalyzer());
pfaw.addAnalyzer(language, new WhitespaceAnalyzer());
What is the value of your Parsed query: output?
On May 26, 2004, at 8:39 AM, [EMAIL PROTECTED] wrote:
I switched to indexing using a text field instead of keyword, then I
tried
the following based on various pieces of advice:
PerFieldAnalyzerWrapper pfaw = new
Being a bit of a newbie I had tried putting -language:zh-HK by itself,
where it seems it will always return no results unless you combine it with
a positive term. However I then tried this and it does not seem to build
the query I had hoped for:
Query: hsbc
Parsed query: contents:hsbc
On May 26, 2004, at 10:48 AM, [EMAIL PROTECTED] wrote:
Query: hsbc -language:zh-HK
Parsed query: (contents:hsbc -language:zh -contents:hk) (keywords:hsbc
-language:zh -keywords:hk) (title:hsbc -language:zh -title:hk)
(language:hsbc
-language:zh -language:HK)
Hits: 169
Not quite what I was
Which asian languages are supported by Lucene ?
What about corean, japanese, thaï, ... ?
If they are not yet supported, what I need to do ?
Thanks,
Christophe
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands,
Hello,
I was wondering if anyone has had problems with memory
usage and MultiSearcher.
My index is composed of two sub-indexes that I search
with a MultiSearcher. The total size of the index is
about 3.7GB with the larger sub-index being 3.6GB and
the smaller being 117MB.
I am using Lucene 1.3
This sounds like a memory leakage situation. If you are using tomcat I
would suggest you make sure you are on a recent version, as it is known to
have some memory leaks in version 4. It doesn't make sense that repeated
queries would use more memory that the most demanding query unless objects
I am trying to index a field in a Lucene document with about 90,000
characters. The problem is that it only indexes part of the document.
It seems to only index about 65,00 characters. So, if I search on terms
that are at the beginning of the text, the search works, but it fails
for terms that
Will,
Thanks for your response. It may be an object leak.
I will look into that.
I just ran some more tests and this time I create a
20GB index by repeatedly merging my large index into
itself.
When I ran my test query against that index I got an
OutOfMemoryError on the very first query. I
Gilberto,
Look at the IndexWriter class. It has a property,
maxFieldLength, which you can set to determine the max
number of characters to be stored in the index.
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html
Jim
--- Gilberto Rodriguez
[EMAIL PROTECTED]
Thanks, James... That solved the problem.
On May 26, 2004, at 4:15 PM, James Dunn wrote:
Gilberto,
Look at the IndexWriter class. It has a property,
maxFieldLength, which you can set to determine the max
number of characters to be stored in the index.
How big are your actual Documents? Are you caching Hits? It stores,
internally, up to 200 documents.
Erik
On May 26, 2004, at 4:08 PM, James Dunn wrote:
Will,
Thanks for your response. It may be an object leak.
I will look into that.
I just ran some more tests and this time I create a
James Dunn wrote:
Also I search across about 50 fields but I don't use
wildcard or range queries.
Lucene uses one byte of RAM per document per searched field, to hold the
normalization values. So if you search a 10M document collection with
50 fields, then you'll end up using 500MB of RAM.
If
Erik,
Thanks for the response.
My actual documents are fairly small. Most docs only
have about 10 fields. Some of those fields are
stored, however, like the OBJECT_ID, NAME and DESC
fields. The stored fields are pretty small as well.
None should be more than 4KB and very few will
approach
I salute the Lucene community!
it will be a great help for me if I get your valuable opinions on the
following issue; I know I could've find more answers to my questions from
reading the documentation but I did invest some time on this and still
have these questions:
I am (also) building a web
Doug,
Thanks!
I just asked a question regarding how to calculate the
memory requirements for a search. Does this memory
only get used only during the search operation itself,
or is it referenced by the Hits object or anything
else after the actual search completes?
Thanks again,
Jim
---
It is cached by the IndexReader and lives until the index reader is
garbage collected. 50-70 searchable fields is a *lot*. How many are
analyzed text, and how many are simply keywords?
Doug
James Dunn wrote:
Doug,
Thanks!
I just asked a question regarding how to calculate the
memory
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWrite
r.html#DEFAULT_MAX_FIELD_LENGTH
maxFieldLength
public int maxFieldLengthThe maximum number of terms that will be indexed
for a single field in a document. This limits the amount of memory required
for indexing, so that
Yeap, that was the problem... I just needed to increase the
maxFieldLength number.
Thanks...
On May 26, 2004, at 5:56 PM, [EMAIL PROTECTED] wrote:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/
IndexWrite
r.html#DEFAULT_MAX_FIELD_LENGTH
maxFieldLength
public int
Hi,
I have a bunch of digits in a field. When I do this search it returns
nothing:
myField:001085609805100
It returns the correct document
when I add a * to the end like this:
myField:001085609805100* --
added the *
I'm not sure what is happening here. I'm thinking
Hi,
It looks like its because I'm using the SimpleAnalyzer instead of the
StandardAnalyzer. What is the SimpleAnalyzer to this query to make it not
work?
Thanks,
Reece
--- Lucene Users List [EMAIL PROTECTED]
wrote:
Hi,
I have a bunch of digits in a field. When I do this search
it
Doug,
We only search on analyzed text fields. There are a
couple of additional fields in the index like
OBJECT_ID that are keywords but we don't search
against those, we only use them once we get a result
back to find the thing that document represents.
Thanks,
Jim
--- Doug Cutting [EMAIL
Whoa! I reread my last post and the last sentence didn't make much sense.
This is what I meant to say:
What is the SimpleAnalyzer doing to this
query to make it not work?
--- Lucene Users List [EMAIL PROTECTED]
wrote:
Hi,
It looks like its because I'm using the SimpleAnalyzer
instead
On May 26, 2004, at 6:38 PM, [EMAIL PROTECTED] wrote:
It looks like its because I'm using the SimpleAnalyzer instead of the
StandardAnalyzer. What is the SimpleAnalyzer to this query to make it
not
work?
http://wiki.apache.org/jakarta-lucene/AnalysisParalysis
It is a good idea to analyze the
CJKAnalyzer suports chinese , japanese and korean languages , Im not sure
about the thai .
i got a CJKAnalyzer from lucene sandbox
- Original Message -
From: Christophe Lombart [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, May 27, 2004 12:01 AM
Subject: Asian
Hi
Lucene developers
Is it possible to do Search and retrieve relevant information on the Indexed
Document
within in specific range settings which may be similar to an
Query in SQL = select * from BOOKSHELF where book1 between 100 and 200
ex:-
search_word , Book between 100
29 matches
Mail list logo