[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015790#comment-13015790
 ] 

Simon Willnauer commented on LUCENE-2793:
-----------------------------------------

bq. Hi. I would be interested in taking this up as a GSOC project . Are there 
any resource that I can read to understand the problem in depth ?
Hey, I just marked it as GSoC-able :) so you are free to take it. I would also 
volunteer to mentor on this issue if mike doesn't want to take it though. Let 
me try to explain you quickly what this issue is about:

Lucene uses Directory as an abstraction on top of the filesystem or RAM or any 
other storage to write the index data. Yet, Lucene has different "stages" where 
we have different requirements to the underlying storage / directory. When you 
index documents you eventually flush the index to disk and continue indexing 
until you created enough "segments" on disk that you need to merged them. This 
operations should if possible not pollute the FS cache since its really just 
housekeeping. With Java such its currently not possible to use some flags like 
DIRECT / SEQUENTIAL, this is what we have the native DirectIODirectory 
implementations in contrib for. Yet, for reading stuff from the index while 
searching we want to have the FS cache helping us as much as possible so this 
has again different requriements. What we currently do is that we pass 
different read buffer sizes to the directory to improve performance. All those 
kinds of information should be passed to the directory on a IndexInput / 
IndexOutput (similar to Input and OutputStream just tailored & enhanced for 
Lucene) basis and this is what this issue is about. 

hope that helps.

> Directory createOutput and openInput should take an IOContext
> -------------------------------------------------------------
>
>                 Key: LUCENE-2793
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2793
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>         Attachments: LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to