[ https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049856#comment-13049856 ]
Christopher Currens commented on LUCENENET-417: ----------------------------------------------- That's a valid question. I think it's mostly common (but not limited to) when Lucene is used to index file systems. As an example, extracted text out of some xls files can be *shudder* in the hundreds of mb. When accuracy is needed in a search, the MaxFieldLength.Unlimited becomes important, as we don't want silent truncation of search terms. The idea of streaming it, as I said before, was more for handling _program memory_, especially when multiple indexes are read/written at the same time, rather than the ability to index a large file. Granted, there are other ways to solve the problem, like what you sort of suggested, breaking up a larger file into smaller chunks. However, not all data is divisible like a book would be, so it's not an ideal solution, especially if you're storing file metadata along with full text. > implement streams as field values > --------------------------------- > > Key: LUCENENET-417 > URL: https://issues.apache.org/jira/browse/LUCENENET-417 > Project: Lucene.Net > Issue Type: New Feature > Components: Lucene.Net Core > Reporter: Christopher Currens > Attachments: StreamValues.patch > > > Adding binary values to a field is an expensive operation, as the whole > binary data must be loaded into memory and then written to the index. Adding > the ability to use a stream instead of a byte array could not only speed up > the indexing process, but reducing the memory footprint as well. > -Java lucene has the ability to use a TextReader the both analyze and store > text in the index.- Lucene.NET lacks the ability to store string data in the > index via streams. This should be a feature added into Lucene .NET as well. > My thoughts are to add another Field constructor, that is Field(string name, > System.IO.Stream stream, System.Text.Encoding encoding), that will allow the > text to be analyzed and stored into the index. > Comments about this approach are greatly appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira