[ https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559887#action_12559887 ]
Michael McCandless commented on LUCENE-1084: -------------------------------------------- {quote} This kind of limit is common on web search engines. It prevents really big pages that crawlers find causing indexing and search from blowing up (think a 100MB PDF that claims it is a text file). So changing it might indeed hurt folks who're indexing uncontrolled web content. {quote} OK, it seems like it's an important safeguard, and risky to change, so let's wait for 3.0. Maybe we could increase it from 10K --> 100K to reduce the times when a legit document is truncated? {quote} An alternative to changing the default setting would be to not have a default - make it a required parameter to the IndexWriter constructor. That way, there is no silent loss (or gain) of content - the user must specify. {quote} I think this is a good idea; it basically forces the user to confront the truncation issue up front. > increase default maxFieldLength? > -------------------------------- > > Key: LUCENE-1084 > URL: https://issues.apache.org/jira/browse/LUCENE-1084 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.2 > Reporter: Daniel Naber > Assignee: Michael McCandless > Fix For: 2.4 > > > To my understanding, Lucene 2.3 will easily index large documents. So > shouldn't we get rid of the 10,000 default limit for the field length? 10,000 > isn't that much and as Lucene doesn't have any error logging by default, this > is a common problem for users that is difficult to debug if you don't know > where to look. > A better new default might be Integer.MAX_VALUE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]