[ 
https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559887#action_12559887
 ] 

Michael McCandless commented on LUCENE-1084:
--------------------------------------------

{quote}
This kind of limit is common on web search engines. It prevents really big 
pages that crawlers find causing indexing and search from blowing up (think a 
100MB PDF that claims it is a text file). So changing it might indeed hurt 
folks who're indexing uncontrolled web content.
{quote}

OK, it seems like it's an important safeguard, and risky to change, so
let's wait for 3.0.

Maybe we could increase it from 10K --> 100K to reduce the times when
a legit document is truncated?

{quote}
An alternative to changing the default setting would be to not have a default - 
make it a required parameter to the IndexWriter constructor. That way, there is 
no silent loss (or gain) of content - the user must specify.
{quote}

I think this is a good idea; it basically forces the user to confront
the truncation issue up front.


> increase default maxFieldLength?
> --------------------------------
>
>                 Key: LUCENE-1084
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1084
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Daniel Naber
>            Assignee: Michael McCandless
>             Fix For: 2.4
>
>
> To my understanding, Lucene 2.3 will easily index large documents. So 
> shouldn't we get rid of the 10,000 default limit for the field length? 10,000 
> isn't that much and as Lucene doesn't have any error logging by default, this 
> is a common problem for users that is difficult to debug if you don't know 
> where to look.
> A better new default might be Integer.MAX_VALUE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to