Re: Problem Indexing Large Document Field

2004-05-26 Thread James Dunn
Gilberto,

Look at the IndexWriter class.  It has a property,
maxFieldLength, which you can set to determine the max
number of characters to be stored in the index.

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html

Jim

--- Gilberto Rodriguez
[EMAIL PROTECTED] wrote:
 I am trying to index a field in a Lucene document
 with about 90,000 
 characters. The problem is that it only indexes part
 of the document. 
 It seems to only index about 65,00 characters. So,
 if I search on terms 
 that are at the beginning of the text, the search
 works, but it fails 
 for terms that are at the end of the document.
 
 Is there a limitation on how many characters can be
 stored in a 
 document field? Any help would be appreciated,
 thanks
 
 
 Gilberto Rodriguez
 Software Engineer
    
 370 CenterPointe Circle, Suite 1178
 Altamonte Springs, FL 32701-3451
    
 407.339.1177 (Ext.112) • phone
 407.339.6704 • fax
 [EMAIL PROTECTED] • email
 www.conviveon.com • web
  
 This e-mail contains legally privileged and
 confidential information 
 intended only for the individual or entity named
 within the message. If 
 the reader of this message is not the intended
 recipient, or the agent 
 responsible to deliver it to the intended recipient,
 the recipient is 
 hereby notified that any review, dissemination,
 distribution or copying 
 of this communication is prohibited. If this
 communication was received 
 in error, please notify me by reply e-mail and
 delete the original 
 message.
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 





__
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Problem Indexing Large Document Field

2004-05-26 Thread Gilberto Rodriguez
Thanks,  James... That solved the problem.
On May 26, 2004, at 4:15 PM, James Dunn wrote:
Gilberto,
Look at the IndexWriter class.  It has a property,
maxFieldLength, which you can set to determine the max
number of characters to be stored in the index.
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ 
IndexWriter.html

Jim
--- Gilberto Rodriguez
[EMAIL PROTECTED] wrote:
I am trying to index a field in a Lucene document
with about 90,000
characters. The problem is that it only indexes part
of the document.
It seems to only index about 65,00 characters. So,
if I search on terms
that are at the beginning of the text, the search
works, but it fails
for terms that are at the end of the document.
Is there a limitation on how many characters can be
stored in a
document field? Any help would be appreciated,
thanks
Gilberto Rodriguez
Software Engineer
  
370 CenterPointe Circle, Suite 1178
Altamonte Springs, FL 32701-3451
  
407.339.1177 (Ext.112)  phone
407.339.6704  fax
[EMAIL PROTECTED]  email
www.conviveon.com  web

This e-mail contains legally privileged and
confidential information
intended only for the individual or entity named
within the message. If
the reader of this message is not the intended
recipient, or the agent
responsible to deliver it to the intended recipient,
the recipient is
hereby notified that any review, dissemination,
distribution or copying
of this communication is prohibited. If this
communication was received
in error, please notify me by reply e-mail and
delete the original
message.

-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]



__
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Gilberto Rodriguez
Software Engineer
 
370 CenterPointe Circle, Suite 1178
Altamonte Springs, FL 32701-3451
 
407.339.1177 (Ext.112)  phone
407.339.6704  fax
[EMAIL PROTECTED]  email
www.conviveon.com  web

This e-mail contains legally privileged and confidential information  
intended only for the individual or entity named within the message. If  
the reader of this message is not the intended recipient, or the agent  
responsible to deliver it to the intended recipient, the recipient is  
hereby notified that any review, dissemination, distribution or copying  
of this communication is prohibited. If this communication was received  
in error, please notify me by reply e-mail and delete the original  
message.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Problem Indexing Large Document Field

2004-05-26 Thread wallen
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWrite
r.html#DEFAULT_MAX_FIELD_LENGTH

maxFieldLength
public int maxFieldLengthThe maximum number of terms that will be indexed
for a single field in a document. This limits the amount of memory required
for indexing, so that collections with very large files will not crash the
indexing process by running out of memory.
Note that this effectively truncates large documents, excluding from the
index terms that occur further in the document. If you know your source
documents are large, be sure to set this value high enough to accomodate the
expected size. If you set it to Integer.MAX_VALUE, then the only limit is
your memory, but you should anticipate an OutOfMemoryError.

By default, no more than 10,000 terms will be indexed for a field. 



-Original Message-
From: Gilberto Rodriguez [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 26, 2004 4:04 PM
To: [EMAIL PROTECTED]
Subject: Problem Indexing Large Document Field


I am trying to index a field in a Lucene document with about 90,000 
characters. The problem is that it only indexes part of the document. 
It seems to only index about 65,00 characters. So, if I search on terms 
that are at the beginning of the text, the search works, but it fails 
for terms that are at the end of the document.

Is there a limitation on how many characters can be stored in a 
document field? Any help would be appreciated, thanks


Gilberto Rodriguez
Software Engineer
   
370 CenterPointe Circle, Suite 1178
Altamonte Springs, FL 32701-3451
   
407.339.1177 (Ext.112) • phone
407.339.6704 • fax
[EMAIL PROTECTED] • email
www.conviveon.com • web
 
This e-mail contains legally privileged and confidential information 
intended only for the individual or entity named within the message. If 
the reader of this message is not the intended recipient, or the agent 
responsible to deliver it to the intended recipient, the recipient is 
hereby notified that any review, dissemination, distribution or copying 
of this communication is prohibited. If this communication was received 
in error, please notify me by reply e-mail and delete the original 
message.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Problem Indexing Large Document Field

2004-05-26 Thread Gilberto Rodriguez
Yeap, that was the problem...  I just needed to increase the  
maxFieldLength number.

Thanks...
On May 26, 2004, at 5:56 PM, [EMAIL PROTECTED] wrote:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ 
IndexWrite
r.html#DEFAULT_MAX_FIELD_LENGTH

maxFieldLength
public int maxFieldLengthThe maximum number of terms that will be  
indexed
for a single field in a document. This limits the amount of memory  
required
for indexing, so that collections with very large files will not crash  
the
indexing process by running out of memory.
Note that this effectively truncates large documents, excluding from  
the
index terms that occur further in the document. If you know your source
documents are large, be sure to set this value high enough to  
accomodate the
expected size. If you set it to Integer.MAX_VALUE, then the only limit  
is
your memory, but you should anticipate an OutOfMemoryError.

By default, no more than 10,000 terms will be indexed for a field.

-Original Message-
From: Gilberto Rodriguez [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 26, 2004 4:04 PM
To: [EMAIL PROTECTED]
Subject: Problem Indexing Large Document Field
I am trying to index a field in a Lucene document with about 90,000
characters. The problem is that it only indexes part of the document.
It seems to only index about 65,00 characters. So, if I search on terms
that are at the beginning of the text, the search works, but it fails
for terms that are at the end of the document.
Is there a limitation on how many characters can be stored in a
document field? Any help would be appreciated, thanks
Gilberto Rodriguez
Software Engineer
  
370 CenterPointe Circle, Suite 1178
Altamonte Springs, FL 32701-3451
  
407.339.1177 (Ext.112)  phone
407.339.6704  fax
[EMAIL PROTECTED]  email
www.conviveon.com  web

This e-mail contains legally privileged and confidential information
intended only for the individual or entity named within the message. If
the reader of this message is not the intended recipient, or the agent
responsible to deliver it to the intended recipient, the recipient is
hereby notified that any review, dissemination, distribution or copying
of this communication is prohibited. If this communication was received
in error, please notify me by reply e-mail and delete the original
message.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Gilberto Rodriguez
Software Engineer
 
370 CenterPointe Circle, Suite 1178
Altamonte Springs, FL 32701-3451
 
407.339.1177 (Ext.112)  phone
407.339.6704  fax
[EMAIL PROTECTED]  email
www.conviveon.com  web

This e-mail contains legally privileged and confidential information  
intended only for the individual or entity named within the message. If  
the reader of this message is not the intended recipient, or the agent  
responsible to deliver it to the intended recipient, the recipient is  
hereby notified that any review, dissemination, distribution or copying  
of this communication is prohibited. If this communication was received  
in error, please notify me by reply e-mail and delete the original  
message.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]