[
https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049856#comment-13049856
]
Christopher Currens commented on LUCENENET-417:
-----------------------------------------------
That's a valid question. I think it's mostly common (but not limited to) when
Lucene is used to index file systems. As an example, extracted text out of
some xls files can be *shudder* in the hundreds of mb. When accuracy is needed
in a search, the MaxFieldLength.Unlimited becomes important, as we don't want
silent truncation of search terms. The idea of streaming it, as I said before,
was more for handling _program memory_, especially when multiple indexes are
read/written at the same time, rather than the ability to index a large file.
Granted, there are other ways to solve the problem, like what you sort of
suggested, breaking up a larger file into smaller chunks. However, not all
data is divisible like a book would be, so it's not an ideal solution,
especially if you're storing file metadata along with full text.
> implement streams as field values
> ---------------------------------
>
> Key: LUCENENET-417
> URL: https://issues.apache.org/jira/browse/LUCENENET-417
> Project: Lucene.Net
> Issue Type: New Feature
> Components: Lucene.Net Core
> Reporter: Christopher Currens
> Attachments: StreamValues.patch
>
>
> Adding binary values to a field is an expensive operation, as the whole
> binary data must be loaded into memory and then written to the index. Adding
> the ability to use a stream instead of a byte array could not only speed up
> the indexing process, but reducing the memory footprint as well.
> -Java lucene has the ability to use a TextReader the both analyze and store
> text in the index.- Lucene.NET lacks the ability to store string data in the
> index via streams. This should be a feature added into Lucene .NET as well.
> My thoughts are to add another Field constructor, that is Field(string name,
> System.IO.Stream stream, System.Text.Encoding encoding), that will allow the
> text to be analyzed and stored into the index.
> Comments about this approach are greatly appreciated.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira