[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015790#comment-13015790
]
Simon Willnauer commented on LUCENE-2793:
-----------------------------------------
bq. Hi. I would be interested in taking this up as a GSOC project . Are there
any resource that I can read to understand the problem in depth ?
Hey, I just marked it as GSoC-able :) so you are free to take it. I would also
volunteer to mentor on this issue if mike doesn't want to take it though. Let
me try to explain you quickly what this issue is about:
Lucene uses Directory as an abstraction on top of the filesystem or RAM or any
other storage to write the index data. Yet, Lucene has different "stages" where
we have different requirements to the underlying storage / directory. When you
index documents you eventually flush the index to disk and continue indexing
until you created enough "segments" on disk that you need to merged them. This
operations should if possible not pollute the FS cache since its really just
housekeeping. With Java such its currently not possible to use some flags like
DIRECT / SEQUENTIAL, this is what we have the native DirectIODirectory
implementations in contrib for. Yet, for reading stuff from the index while
searching we want to have the FS cache helping us as much as possible so this
has again different requriements. What we currently do is that we pass
different read buffer sizes to the directory to improve performance. All those
kinds of information should be passed to the directory on a IndexInput /
IndexOutput (similar to Input and OutputStream just tailored & enhanced for
Lucene) basis and this is what this issue is about.
hope that helps.
> Directory createOutput and openInput should take an IOContext
> -------------------------------------------------------------
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Store
> Reporter: Michael McCandless
> Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold
> the buffer size, but then could hold other flags like DIRECT (bypass OS's
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for
> merging is not then used for searching, and vice/versa. Really, it's only
> all the open file handles that need to be different -- we could in theory
> share del docs, norms, etc, if that were somehow possible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]