[ 
https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049331#comment-13049331
 ] 

Christopher Currens commented on LUCENENET-417:
-----------------------------------------------

Also, SimpleFSDirectory doesn't really support stream indexing as much as I 
would hope.  The issue lies in that SimpleFSDirectory creates a RAMOutputStream 
that it uses before its flushed to disk.  The PerDoc class keeps the entire 
thing in memory before flushing to disk.  I'm assuming it does this so indexes 
aren't corrupted.

It seems a good idea may be to create a new Directory implementation that has a 
special IndexOutput that will buffer to disk when a certain limit is hit, to 
prevent OOM exceptions indexing huge amounts of data.  However, I'm not sure 
that falls within scope of Lucene.Net...maybe contrib?  I have some ideas on 
how to do this without leaving behind any artifacts, like temp files.  It seems 
the easiest way would be using MemoryMappedFile, as GC frees the file, even 
under early termination of a program.  Unfortunately, that's a .Net 4 only 
class.

> implement streams as field values
> ---------------------------------
>
>                 Key: LUCENENET-417
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-417
>             Project: Lucene.Net
>          Issue Type: New Feature
>          Components: Lucene.Net Core
>            Reporter: Christopher Currens
>         Attachments: StreamValues.patch
>
>
> Adding binary values to a field is an expensive operation, as the whole 
> binary data must be loaded into memory and then written to the index.  Adding 
> the ability to use a stream instead of a byte array could not only speed up 
> the indexing process, but reducing the memory footprint as well.
> -Java lucene has the ability to use a TextReader the both analyze and store 
> text in the index.-  Lucene.NET lacks the ability to store string data in the 
> index via streams. This should be a feature added into Lucene .NET as well.  
> My thoughts are to add another Field constructor, that is Field(string name, 
> System.IO.Stream stream, System.Text.Encoding encoding), that will allow the 
> text to be analyzed and stored into the index.
> Comments about this approach are greatly appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to